Question :
I am using mysql5.1, i have table which has about 15 lakh (1.5 million) records.This table has records for different entities i.e child records for all master entities.
There are 8 columns in this table , out of which 6 columns are clubbed to make a primary key. These columns could be individual foreign keys but due to performance we have made this change.
Below is the show create table output
CREATE TABLE `hybrid_exp_trait_dtl` (
`HYBRID_NUMBER` varchar(100) NOT NULL,
`TRIAL_STATUS` char(4) NOT NULL,
`LOCATION_CODE` char(5) NOT NULL,
`EXPERIMENT_ID` char(20) NOT NULL,
`REPLICATION_NUMBER` int(3) unsigned NOT NULL,
`TRAIT_NAME` varchar(30) NOT NULL,
`TRAIT_VALUE` varchar(50) DEFAULT NULL,
`MACHINE_NO` char(2) NOT NULL,
`USER_ID` char(8) NOT NULL,
`MYTIME` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY `FK_HYBRID_EXP_TRAIT_DTL_2` (`LOCATION_CODE`),
KEY `FK_hybrid_exp_trait_dtl_3` (`USER_ID`),
KEY `Index_4` (`HYBRID_NUMBER`, `EXPERIMENT_ID`, `TRIAL_STATUS`, `LOCATION_CODE`, `REPLICATION_NUMBER`, `TRAIT_NAME`, `USER_ID`) USING BTREE,
CONSTRAINT `FK_HYBRID_EXP_TRAIT_DTL_2` FOREIGN KEY (`LOCATION_CODE`) REFERENCES `location_mst` (`LOCATION_CODE`),
CONSTRAINT `FK_hybrid_exp_trait_dtl_3` FOREIGN KEY (`USER_ID`) REFERENCES `user_data` (`USER_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
As you can see I have four indexes
- KEY
FK_HYBRID_EXP_TRAIT_DTL_2
(LOCATION_CODE
) - KEY
FK_hybrid_exp_trait_dtl_3
(USER_ID
) - KEY
Index_4
(HYBRID_NUMBER
,EXPERIMENT_ID
,TRIAL_STATUS
,LOCATION_CODE
,REPLICATION_NUMBER
,TRAIT_NAME
,USER_ID
) USING BTREE
Below are the two queries which take long time 4.26 and 0.64 respectively.
select distinct
location_code,
Max(replication_number) as replication
from hybrid_exp_trait_dtl t
where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT';
And this one
SELECT *
FROM hybrid_exp_trait_dtl h
where h.experiment_id='NSKOK12K1102'
and h.TRAIT_NAME='YLD' ;
In order to improve performance i modified 3rd index and added one more index , so below are the indexes for my tables
- KEY
FK_HYBRID_EXP_TRAIT_DTL_2
(LOCATION_CODE
) - KEY
FK_hybrid_exp_trait_dtl_3
(USER_ID
) - KEY
Index_4
(EXPERIMENT_ID
,TRIAL_STATUS
,LOCATION_CODE
,REPLICATION_NUMBER
) - KEY
hbrid
(HYBRID_NUMBER
,EXPERIMENT_ID
,TRAIT_NAME
)
Now performance has increased significantly for both the queries and now performance time is .45 and .3 sec respectively.
Is this indexing correct , will it have effect on my update and insert statement as i will be doing bulk update.
Answer :
I took the liberty of editing your SQL so the clauses are on separate lines. I violated CodeReview group discipline by doing that. Sorry! My suggestion to you is to format your queries so the various clauses are clearly visible, as I did. This kind of formatting, in my experience, makes it easier to understand the logic of queries and to spot errors.
You’ve done a good job declaring your columns NOT NULL
where possible. That helps performance a lot.
Your first query seems to be illogical, as it combines an aggregate function with a DISTINCT
qualifier. MySQL allows a lot of sloppiness in aggregate queries unfortunately.
It looks like you’re looking for a resultset with a row for each location that shows the the largest replication_number at that location. Do you want this query instead?
select location_code,
Max(replication_number) as replication
from hybrid_exp_trait_dtl t
where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT'
group by location_code
Your Index_4
will allow this query to be satisfied using a so-called loose index scan, which is really very fast.
Your second query will most likely be a lot faster if you change your fourth compound key (the one you’ve named hbrid
) to have the following order.
(EXPERIMENT_ID,TRAIT_NAME,HYBRID_NUMBER)
That is because your query looks for exact values of EXPERIMENT_ID
and TRAIT_NAME
. When those two columns come first in the index, the query can jump to the exact position in the index and then sequentially reel off your results.
Now, you may have some other query that needs to have HYBRID_NUMBER
first, but you didn’t mention it in your question.
You asked what effect these indexes will have on load performance. It’s hard to answer that question exactly without knowing a lot about your load process.
You don’t have any indexes that call for uniqueness. That is very good for loading performance.
But, there are a few things you can do to speed up loading. They are detailed here. The most important speed trick is probably to avoid autocommit. Start your import process with this SQL command.
SET autocommit=0;
Then do a few hundred INSERT
or UPDATE
statements in a row, then do
COMMIT;
Don’t try to do too many INSERT
or UPDATE
statements before COMMIT
or you’ll use up a lot of RAM with transaction buffers.
You can also turn of foreign key checks while loading. See the web page mentioned above. But be careful. If your loaded data violates foreign key constraints, when you try to turn checking back on you’ll have a mess.
When you’re done with a load operation that changes a large fraction of the rows, do
OPTIMIZE LOCAL TABLE hybrid_exp_trait_dtl;
Your table will be unavailable while this OPTIMIZE
operation runs: it rebuilds the table and indexes.