I have a table with the following (slightly simplified) structure
CREATE TABLE `oak_relation` ( `o_id` int(10) unsigned NOT NULL, `k_id` bigint(20) unsigned NOT NULL, `initial` float unsigned NOT NULL, `xy` float unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`o_id`,`k_id`), KEY `xy` (`xy`) ) ENGINE=InnoDB
There are multiple instances of the same o_id, and multiple instances of the same k_id, but the primary key ensures there are only single instances of a row with an individual o_id & k_id.
Values for initial range from 0 to 1.
I would like to run an update to generate values for “xy”.
The update should perform the following operation:
For every row in the table, calulate the xy value;
xy = x * y
x = `initial` value / number of rows matching the o_id of this row y = total number of distinct o_ids / number of rows matching the k_id of the row
My query so far:
UPDATE oak_relation AS final INNER JOIN ( SELECT oak.o_id , oak.k_id , oak.initial, link.numLinks, COALESCE( ( oak.initial / link.numLinks ), 0 ) AS x FROM oak_relation AS oak LEFT OUTER JOIN ( SELECT o_id , COALESCE( COUNT( k_id ), 0 ) AS numLinks FROM oak_relation GROUP BY o_id ) AS link ON oak.o_id = link.o_id ) AS xjoin INNER JOIN ( SELECT k_id , COUNT( o_id ) AS numOforK , COALESCE( ( tmp.totalO / COUNT( DISTINCT( o_id ) ) ), 0 ) AS y FROM oak_relation, ( SELECT COUNT( DISTINCT( o_id ) ) AS totalO FROM oak_relation ) AS tmp GROUP BY k_id HAVING COUNT( o_id ) > 1 ) AS yjoin SET final.xy = xjoin.x * yjoin.y WHERE final.o_id = xjoin.o_id AND final.k_id = xjoin.k_id AND xjoin.k_id = yjoin.k_id
The trouble I am having is my query doesn’t seem to update all the rows and my SQL knowledge is a little limited in updates using joins.
I thought perhaps it was just not updating if the xy result was 0 (if initial is 0), but selecting the expected result set shows that some xy = 0 values are being set, also tested by setting default xy value to some arbitrarily large number and examining the result after update. Some rows were definitely being set to 0.
So I’m a little stuck! For the rows where the calculation happens, it appears to work correctly, I just seem to not be selecting the correct set of data to update so it calculates for every row of the table.
Thanks very much in advance for any help or insight you might be able to provide.
I think this will do:
UPDATE ( SELECT COUNT(DISTINCT o_id) AS total_oids FROM oak_relation ) AS d CROSS JOIN oak_relation AS u JOIN ( SELECT o_id, COUNT(*) AS matching_o_ids FROM oak_relation GROUP BY o_id ) AS o ON o.o_id = u.o_id JOIN ( SELECT k_id, COUNT(*) AS matching_k_ids FROM oak_relation GROUP BY k_id ) AS k ON k.k_id = u.k_id SET u.xy = u.initial * d.total_oids / o.matching_o_ids / k.matching_k_ids ;
It would be good to check it out first by running the equivalent
SELECT u.o_id, u.k_id, u.initial, d.total_oids, o.matching_o_ids, k.matching_k_ids, u.initial * d.total_oids / o.matching_o_ids / k.matching_k_ids AS new_xy FROM ( SELECT COUNT(DISTINCT o_id) AS total_oids FROM oak_relation ) AS d CROSS JOIN oak_relation AS u JOIN ( SELECT o_id, COUNT(*) AS matching_o_ids FROM oak_relation GROUP BY o_id ) AS o ON o.o_id = u.o_id JOIN ( SELECT k_id, COUNT(*) AS matching_k_ids FROM oak_relation GROUP BY k_id ) AS k ON k.k_id = u.k_id ;