Question :
I am trying to handle a scenario that matches other users by their interest and gender.
There will be 2 genders and also 5 interests for each user and each of them will be either 0 or 1. and a rate, the sum (gender+interest) will be calculated and sorted together with last update timestamp in descending order depending on what the user select. (e.g. A user select Female and Movie, ORDER BY SUM(male + movie) DESC, timestamp DESC)
User may select
- select interest but not gender
- select gender but not interest
- both gender and interest
- neither gender and interest
Desired outcome:
- user A select male and movie
- user B select female, no interest (with the most recent timestamp)
- user C select female and movie
- user D select no gender, and select movie interest
- user E select neither
When user A performs the search, the ordered result should be
- C with 2 Points
- B with 1 Point
- D with 1 Point
- E with 0 Point
I try to use a table structure like below and use 150,000 rows of data to test it.
CREATE TABLE `candidate_profiles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`male` decimal(3,2) NOT NULL,
`female` decimal(3,2) NOT NULL,
`interestA` decimal(3,2) NOT NULL,
`interestB` decimal(3,2) NOT NULL,
`interestC` decimal(3,2) NOT NULL,
`interestD` decimal(3,2) NOT NULL,
`interestE` decimal(3,2) NOT NULL,
`is_active` tinyint(1) DEFAULT '1',
`timestamp` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Below is the query that I used
SELECT
c.id,
p.id AS profile_id,
c.message,
(gender + interestB) AS point,
c.timestamp,
CASE
WHEN male = 1 THEN 'male'
WHEN female = 1 THEN 'female'
ELSE 'disabled'
END AS 'gender',
CASE
WHEN interestA = 1 THEN 'interestA'
WHEN interestB = 1 THEN 'interestB'
WHEN interestC= 1 THEN 'interestC'
WHEN interestD = 1 THEN 'interestD'
WHEN interestE = 1 THEN 'interestE'
ELSE 'disabled'
END AS 'interest'
FROM
candidate c
INNER JOIN
profiles p ON p.candidate_id = c.id
LEFT JOIN
ignored ci ON ci.candidate_id = 1
AND ci.target_id = c.id
WHERE
c.state = 0
AND ci.id IS NULL
AND c.id NOT IN (151011 , 151009,
151008,
151007,
151006,
151005,
151004,
151003)
GROUP BY c.ID
ORDER BY c.timestamp DESC , point DESC
LIMIT 8
and this is the EXPLAIN result
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------+-------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------+-------+----------+--------------------------------------------------------+
| 1 | SIMPLE | c | NULL | ref | PRIMARY,idx_distributor_candidate_timestamp,idx_distributor_candidate_state_timestamp,v110_id_timestamp_state,id_timestamp_state | idx_distributor_candidate_state_timestamp | 4 | const | 75551 | 50.01 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | ci | NULL | eq_ref | uk_candidate_id_target_id | uk_candidate_id_target_id | 8 | const,wowo.c.id | 1 | 33.33 | Using where; Not exists; Using index |
| 1 | SIMPLE | p | NULL | ref | idx_candidate_profile_2_candidate_id | idx_candidate_profile_2_candidate_id | 4 | wowo.c.id | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------+-------+----------+--------------------------------------------------------+
I realize, because of the sum operation, I can’t create a proper index. And the query used 3-4 seconds.
I want to seek another way that could speed up the query.
Would “Pre-Calculated” values be the way? meaning with 2 genders and 5 interests, it would have 17 columns of variations.
Or is there another approach that could be used. thanks!
Updated, Oct 31: Add the desired outcome. Current query and EXPLAIN result
Answer :
If all you have is 0 or 1, then TINYINT UNSIGNED
should suffice (and be half the size).
Are you selecting all the ‘matching’ users? Or just ‘counting’ how many there are?
If selecting all that match, I would consider using a TINYINT UNSIGNED
because it has 8 bits and they could represent gender, the interests, and is_active. Then apply a mask to select the bits that need to be looked at, and an ‘=’ to test them all at one time.
I see no need for the interests to be in separate column. Instead, have a single column with
interests ENUM ('a', 'b', ...)
INDEX(interests, gender)
INDEX(gender, interests)
WHERE gender = 1 AND interests = 'a'
WHERE interests = 'd'
(etc -- covering the 4 cases you mentioned)
Please provide complete queries, and/or sample output.