Optimize extremely slow SQL query [closed]

Posted on

Question :

SELECT j.job_id,j.last_updated,j.job_title,j.status,
(SELECT COUNT(*) FROM candidates WHERE job_id=j.job_id) resumes_count,
(SELECT COUNT(*) FROM candidates c WHERE job_id=j.job_id AND c.ref_type=1) auto_resumes_count,
(SELECT COUNT(*) FROM candidates c WHERE job_id=j.job_id AND c.ref_type=0) self_resumes_count,
(SELECT COUNT(*) FROM candidates c WHERE job_id=j.job_id AND c.ref_type=-1) paging_resumes_count,
(SELECT SUM(views) FROM job_views WHERE job_id=j.job_id) job_views,
(SELECT posts FROM autopost_jobs WHERE job_id=j.job_id) autopost_count
FROM jobs j WHERE j.user_id={$userId} ORDER BY j.last_updated DESC,STATUS

Table structure:

| jobs  | CREATE TABLE `jobs` (
  `job_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(20) NOT NULL,
  `job_title` varchar(450) NOT NULL,
  `job_desc` varchar(5000) NOT NULL,
  `sjob_data` text NOT NULL,
  `job_requirements` varchar(5000) DEFAULT NULL,
  `years_of_experience` varchar(100) NOT NULL,
  `location_id` int(11) unsigned NOT NULL,
  `department_id` int(11) NOT NULL,
  `role_id` int(10) unsigned NOT NULL,
  `job_company_name_alt` varchar(100) DEFAULT NULL,
  `is_show_company` tinyint(1) NOT NULL DEFAULT '1',
  `status` tinyint(2) NOT NULL DEFAULT '0' COMMENT '0=inactive,1=active,2=deleted',
  `job_reward` varchar(128) DEFAULT NULL,
  `job_lang` int(11) NOT NULL DEFAULT '0' COMMENT '0 - Ru 1 - En',
  `job_photo` varchar(100) DEFAULT NULL,
  `last_updated` datetime NOT NULL,
  `is_hot_job` tinyint(1) NOT NULL DEFAULT '0',
  `job_internal_id` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`job_id`)
) ENGINE=MyISAM AUTO_INCREMENT=1794 DEFAULT CHARSET=utf8 |

| candidates | CREATE TABLE `candidates` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) NOT NULL,
  `hr_id` int(11) DEFAULT NULL,
  `job_id` int(10) unsigned NOT NULL DEFAULT '0',
  `through_user_id` int(11) NOT NULL,
  `ref_type` tinyint(4) NOT NULL COMMENT '0 = personal , 1 = automatic , 2 = file uploaded , 3 = lead sent',
  `media_id` tinyint(2) NOT NULL DEFAULT '-1',
  `status_id` int(11) DEFAULT NULL,
  `cv_filename` varchar(250) DEFAULT NULL,
  `lead_url` varchar(256) DEFAULT NULL,
  `is_deleted` tinyint(1) NOT NULL DEFAULT '0',
  `is_viewed` tinyint(1) NOT NULL DEFAULT '0',
  `is_hired` tinyint(1) NOT NULL DEFAULT '0',
  `is_employee_note` tinyint(4) NOT NULL DEFAULT '0',
  `note` varchar(1000) NOT NULL,
  `last_updated` datetime NOT NULL,
  `device_type` int(1) NOT NULL DEFAULT '0' COMMENT '0 - web, 1 - mobile',
  `third_party_approval` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `setIndex` (`user_id`,`job_id`,`through_user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=20012 DEFAULT CHARSET=utf8 |
| autopost_jobs | CREATE TABLE `autopost_jobs` (
  `job_id` int(11) NOT NULL,
  `hr_id` int(11) NOT NULL,
  `posts` bigint(20) DEFAULT '0',
  `media_id` int(11) NOT NULL,
  `last_user_id` int(11) DEFAULT NULL,
  `last_updated` datetime DEFAULT NULL,
  `inserted_date` datetime DEFAULT NULL,
  PRIMARY KEY (`job_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 |
| job_views | CREATE TABLE `job_views` (
  `user_id` int(11) NOT NULL,
  `hr_id` int(11) NOT NULL,
  `job_id` int(11) NOT NULL,
  `media_id` int(11) NOT NULL,
  `post_type_id` int(11) NOT NULL COMMENT '0 - self, 1 - auto, -1 - undefined ',
  `date` date NOT NULL,
  `views` bigint(20) NOT NULL DEFAULT '0',
  `last_time` time NOT NULL,
  PRIMARY KEY (`user_id`,`hr_id`,`job_id`,`media_id`,`date`,`post_type_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 |

This query just hangs, what could be the reason?, how can I optimize it

Answer :

Indexes to add:

Table: candidates
  (job_id, ref_type)

Table: job_views
  (job_id, views)

Table: autopost_jobs
  (job_id, posts)

Then run again the queries and post the updated EXPLAIN

I think you forgot SUM() aggregate function in this subquery :

(SELECT SUM(posts) FROM autopost_jobs WHERE job_id=j.job_id) autopost_count

I would use a CTE to query candidates table.

I think the following version should have a better execution plan :

SELECT j.job_id,j.last_updated,j.job_title,j.status,
    MAX(CASE WHEN  ref_type = 1 THEN  resumes_count END) as auto_resumes_count ,
    MAX(CASE WHEN  ref_type = 0 THEN  resumes_count END )  as self_resumes_count ,
    MAX(CASE WHEN  ref_type = -1 THEN  resumes_count END )as paging_resumes_count,
    MAX(CASE WHEN  ref_type IS NULL THEN  resumes_count END )as resumes_count,
    (SELECT SUM(views) FROM job_views WHERE job_id=j.job_id) job_views,
    (SELECT posts FROM autopost_jobs WHERE job_id=j.job_id) autopost_count


    FROM jobs j 
    LEFT JOIN 
    (
        SELECT job_id, ref_type , COUNT(1) as resumes_count 
        FROM 
        candidates 
        GROUP BY job_id,ref_type
        WITH ROLLUP
    )c ON (c.job_id = j.job_id)

    WHERE j.user_id={$userId} 
    GROUP BY j.job_id
    ORDER BY j.last_updated DESC,STATUS;

You may try adding a filter to subquery in left join as well (I’m not sure if value of job.user_id corresponds candidates.user_id, so I put 2 versions of filter in comments) :

        SELECT j.job_id,j.last_updated,j.job_title,j.status,
        MAX(CASE WHEN  ref_type = 1 THEN  resumes_count END) as auto_resumes_count ,
        MAX(CASE WHEN  ref_type = 0 THEN  resumes_count END )  as self_resumes_count ,
        MAX(CASE WHEN  ref_type = -1 THEN  resumes_count END )as paging_resumes_count,
        MAX(CASE WHEN  ref_type IS NULL THEN  resumes_count END )as resumes_count,
        (SELECT SUM(views) FROM job_views WHERE job_id=j.job_id) job_views,
        (SELECT posts FROM autopost_jobs WHERE job_id=j.job_id) autopost_count


        FROM jobs j 
        LEFT JOIN 
        (
            SELECT job_id, ref_type , COUNT(1) as resumes_count 
            FROM 
            candidates c  
            -- if c.user_id is the same as j.user_id you can add following filter:
            ---------------------------
             -- WHERE c.user_id = {user_id}  
            ---------------------------
            -- if it's different , try this filter :
            ---------------------------
             -- WHERE EXISTS (SELECT NULL FROM job j1 WHERE j1.job_id =c.job_id
             --   AND j1.user_id ={user_id})
            --------------------------- 
            GROUP BY job_id,ref_type
            WITH ROLLUP
        )c ON (c.job_id = j.job_id)

        WHERE j.user_id={$userId} 
        GROUP BY j.job_id
        ORDER BY j.last_updated DESC,STATUS;

I don’t have a chance to check actual execution plans, but it should be better because candidate tables is scanned once compared to 4 times in the question.

Leave a Reply

Your email address will not be published. Required fields are marked *