I have a transactions table which is having 600,000 records, I need to list the count for the dashboard on financial year basis. The table used is MyISAM. I tried adding index for the transaction date (
tran_date). Even though it is using the index it creates temporary table which is taking more time because of the temporary table and the filesort. Is there any way to optimize the query to improve the query time?
SELECT COUNT( * ) AS cnt, CASE WHEN MONTH( tran_date ) >=3 THEN concat( YEAR( tran_date ) , '-', YEAR( tran_date ) +1 ) ELSE concat( YEAR( tran_date ) -1, '-', YEAR( tran_date ) ) END AS financial_year FROM `transactions1` WHERE tran_date >= '2010-06-01' GROUP BY financial_year Showing rows 0 - 4 (5 total, Query took 1.2095 sec)
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE transactions1 range PRIMARY,tran_date tran_date 8 NULL 346485 Using where; Using index; Using temporary; Using filesort
Keyname Type Unique Packed Field Cardinality Collation PRIMARY BTREE Yes No tran_date 205720 A tran_ID 617162 A coupon_No BTREE No No coupon_No 617162 A account_typeBTREE No No account_type 3 A prodCode BTREE No No prodCode 430 A tran_date 308581 A tran_date BTREE No No tran_date 205720 A cust_ID BTREE No No cust_ID 3265 A tran_date 308581 A account_type 308581 A points_earned 617162 A
Tried adding partition which is not that much helpful in comparison with non partitioned one. Does replication help in this case for reading this table?. There will be more grouping based on the dates (using the date functions) when reading the data.
I altered the query and reduced the query execution time. The query I used is,
SELECT SUM( count ) FROM ( SELECT COUNT( * ) AS count, CASE WHEN MONTH( tran_date ) >=3 THEN concat( YEAR( tran_date ) , '-', YEAR( tran_date ) +1 ) ELSE concat( YEAR( tran_date ) -1, '-', YEAR( tran_date ) ) END AS format_date FROM transactions1 GROUP BY tran_date ) AS s GROUP BY format_date Showing rows 0 - 4 (5 total, Query took 0.5636 sec)
id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY <derived2> ALL NULL NULL NULL NULL 229676 Using temporary; Using filesort 2 DERIVED transactions1 index NULL tran_date 8 NULL 617162 Using index
But when using
SELECT COUNT( * ) AS count, CASE WHEN MONTH( tran_date ) >=3 THEN concat( YEAR( tran_date ) , '-', YEAR( tran_date ) +1 ) ELSE concat( YEAR( tran_date ) -1, '-', YEAR( tran_date ) ) END AS format_date FROM transactions1 GROUP BY tran_date Showing rows 0 - 29 (229,676 total, Query took 0.0006 sec)
gives less time without using the
SUM(count) in the derived table. Is there any other way to get the sum without using the subquery in MySQL or can the subquery be optimized to get the index.
I don’t see a lot of opportunity for improvement.
The index you added was probably a big help, because it’s being used for the range matching on the WHERE clause (type => range, key => tran_date), and it’s being used as a covering index (extra => using index), avoiding the need to seek into the table to fetch the row data.
But since you’re using functions to construct the financial_year value for the group by, both the “using filesort” and “using temporary” can’t be avoided. But, those aren’t the real problem. The real problem is that you’re evaluating MONTH(tran_date) 346,485 times and YEAR(tran_date) at least that many times… ~700,000 function calls in one second doesn’t seem too bad.
Plan B: I am definitely not a fan of storing redundant data, and I’m dead-set against making the application responsible for maintaining it… but one option I might be tempted to try would be to create a dashboard_stats_by_financial_year table, and use after-insert/update/delete triggers on the transactions1 table to manage keeping those stats current.
That option has a cost, of course — adding to the amount of time it takes to update/insert/delete a transaction… but, waiting > 1200 milliseconds for stats for your dashboard is a cost, too. So it may come down to whether you want to pay for it now or pay for it later.
I think that the more efficient way would be to produce the query (using dynamic SQL or an external language) so it uses the index on
SELECT cnt , CONCAT(year, '-', year+1) AS financial_year FROM ( SELECT COUNT(*) AS cnt, 2010 AS year FROM transactions1 WHERE tran_date >= '2010-06-01' AND tran_date < '2011-04-01' UNION ALL SELECT COUNT(*), 2011 FROM transactions1 WHERE tran_date >= '2011-04-01' AND tran_date < '2012-04-01' UNION ALL SELECT COUNT(*), 2012 FROM transactions1 WHERE tran_date >= '2012-03-01' AND tran_date < '2013-04-01' ) AS tmp ORDER BY year ;
You could also introduce a
CREATE TABLE Fiscal_Calendar ( fiscal_year SMALLINT NOT NULL , start_date DATE NOT NULL , next_start_date DATE NOT NULL , PRIMARY KEY (fiscal_year) , INDEX start_date_IDX (start_date) ) ; INSERT INTO Fiscal_Calendar (fiscal_year, start_date, next_start_date) VALUES (1900, '1900-04-01', '1901-04-01'), --- (2099, '2099-04-01', '2100-04-01') ;
And then use that (no dynamic SQL needed):
SELECT ( SELECT COUNT(*) FROM transactions1 AS tr WHERE tr.tran_date >= fc.start_date AND tr.tran_date < fc.next_start_date ) AS cnt, CONCAT(fc.year, '-', fc.year+1) AS financial_year FROM Fiscal_Calendar AS fc JOIN ( SELECT year FROM Fiscal_Calendar WHERE start_date <= '2010-06-01' ORDER BY start_date DESC ) AS param ON fc.year >= param.year AND fc.year <= YEAR(NOW()) ORDER BY fc.year ;
I would try grouping by the year of
tran_date decreased by 3 months. The final formatting of
year1-year2 could be done on the grouped result set:
SELECT count, CONCAT(finyear, '-', finyear + 1) AS formattedfinyear FROM ( SELECT COUNT(*) AS count, YEAR(trand_date - INTERVAL 3 MONTH) AS finyear FROM transactions1 GROUP BY finyear ) s ;