I am using mysql5.1, i have table which has about 15 lakh (1.5 million) records.This table has records for different entities i.e child records for all master entities.
There are 8 columns in this table , out of which 6 columns are clubbed to make a primary key. These columns could be individual foreign keys but due to performance we have made this change.
Below is the show create table output
CREATE TABLE `hybrid_exp_trait_dtl` ( `HYBRID_NUMBER` varchar(100) NOT NULL, `TRIAL_STATUS` char(4) NOT NULL, `LOCATION_CODE` char(5) NOT NULL, `EXPERIMENT_ID` char(20) NOT NULL, `REPLICATION_NUMBER` int(3) unsigned NOT NULL, `TRAIT_NAME` varchar(30) NOT NULL, `TRAIT_VALUE` varchar(50) DEFAULT NULL, `MACHINE_NO` char(2) NOT NULL, `USER_ID` char(8) NOT NULL, `MYTIME` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, KEY `FK_HYBRID_EXP_TRAIT_DTL_2` (`LOCATION_CODE`), KEY `FK_hybrid_exp_trait_dtl_3` (`USER_ID`), KEY `Index_4` (`HYBRID_NUMBER`, `EXPERIMENT_ID`, `TRIAL_STATUS`, `LOCATION_CODE`, `REPLICATION_NUMBER`, `TRAIT_NAME`, `USER_ID`) USING BTREE, CONSTRAINT `FK_HYBRID_EXP_TRAIT_DTL_2` FOREIGN KEY (`LOCATION_CODE`) REFERENCES `location_mst` (`LOCATION_CODE`), CONSTRAINT `FK_hybrid_exp_trait_dtl_3` FOREIGN KEY (`USER_ID`) REFERENCES `user_data` (`USER_ID`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
As you can see I have four indexes
USER_ID) USING BTREE
Below are the two queries which take long time 4.26 and 0.64 respectively.
select distinct location_code, Max(replication_number) as replication from hybrid_exp_trait_dtl t where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT';
And this one
SELECT * FROM hybrid_exp_trait_dtl h where h.experiment_id='NSKOK12K1102' and h.TRAIT_NAME='YLD' ;
In order to improve performance i modified 3rd index and added one more index , so below are the indexes for my tables
Now performance has increased significantly for both the queries and now performance time is .45 and .3 sec respectively.
Is this indexing correct , will it have effect on my update and insert statement as i will be doing bulk update.
I took the liberty of editing your SQL so the clauses are on separate lines. I violated CodeReview group discipline by doing that. Sorry! My suggestion to you is to format your queries so the various clauses are clearly visible, as I did. This kind of formatting, in my experience, makes it easier to understand the logic of queries and to spot errors.
You’ve done a good job declaring your columns
NOT NULL where possible. That helps performance a lot.
Your first query seems to be illogical, as it combines an aggregate function with a
DISTINCT qualifier. MySQL allows a lot of sloppiness in aggregate queries unfortunately.
It looks like you’re looking for a resultset with a row for each location that shows the the largest replication_number at that location. Do you want this query instead?
select location_code, Max(replication_number) as replication from hybrid_exp_trait_dtl t where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT' group by location_code
Index_4 will allow this query to be satisfied using a so-called loose index scan, which is really very fast.
Your second query will most likely be a lot faster if you change your fourth compound key (the one you’ve named
hbrid) to have the following order.
That is because your query looks for exact values of
TRAIT_NAME. When those two columns come first in the index, the query can jump to the exact position in the index and then sequentially reel off your results.
Now, you may have some other query that needs to have
HYBRID_NUMBER first, but you didn’t mention it in your question.
You asked what effect these indexes will have on load performance. It’s hard to answer that question exactly without knowing a lot about your load process.
You don’t have any indexes that call for uniqueness. That is very good for loading performance.
But, there are a few things you can do to speed up loading. They are detailed here. The most important speed trick is probably to avoid autocommit. Start your import process with this SQL command.
Then do a few hundred
UPDATE statements in a row, then do
Don’t try to do too many
UPDATE statements before
COMMIT or you’ll use up a lot of RAM with transaction buffers.
You can also turn of foreign key checks while loading. See the web page mentioned above. But be careful. If your loaded data violates foreign key constraints, when you try to turn checking back on you’ll have a mess.
When you’re done with a load operation that changes a large fraction of the rows, do
OPTIMIZE LOCAL TABLE hybrid_exp_trait_dtl;
Your table will be unavailable while this
OPTIMIZE operation runs: it rebuilds the table and indexes.