I’ve 1 query written in our legacy system. (using MySQL 5.5)
Now, with growing data – the below mentioned query taking huge time.
In our system, we’ve somewhat 2,000,000,000 (2 billion rows) approx 650 GB of data.
Table is partitioned with respect to every day. (which means above query is fetching data from 30 partitions).
Allocate 16GB to innodb_buffer_pool_size.
SELECT * FROM ( SELECT a,b,c,d,e,f,g,h,i,j,k,l FROM TEST WHERE START_TIME between '2013-11-14 00:00:01' and '2013-12-14 23:59:59' ORDER BY START_TIME DESC ) as TEST_DATA LIMIT 10000;
Above Query => Means selecting all the columns for all the data between 1 month and performing sorting and at last show 10000 records to end user.
Now, my doubt goes: Query-2
SELECT a,b,c,d,e,f,g,h,i,j,k,l FROM TEST WHERE START_TIME between '2013-11-14 00:00:01' and '2013-12-14 23:59:59' ORDER BY START_TIME DESC limit 10000;
Above Query => selecting all the columns from 1 month of data and perform sorting and display the result as soon as 10000 records sorted. (No sorting and buffering of all records).
We have an index on START_TIME.
With Query-1 and Query-2 -> Does these 2 queries will display different result set? Or same?
Because in Query-1, we doing sorting on all records and then display 10k
whereas in Query-2, we doing display 10k sorted records.
Thanks a lot for your help.
Look at the Difference
- The first query has a subquery that will do an index scan on the START_TIME index that can potentially return more than 10,000 rows.Then, 10,000 rows must be culled from it to produce the result.
- The second query would do an index scan and gather no more than 10,000 rows.
Both queries will display the same result, but the second query will do less work and take less processing to achieve.