Let’s say I have a very large lookup table that looks like this:
CREATE TABLE `MyLookup` ( `FKToTableA` bigint(20) NOT NULL, `FKToTableB` bigint(20) NOT NULL, `Count` bigint(20) NOT NULL, `Bytes` bigint(20) NOT NULL, `Packets` bigint (20) NOT NULL, ) ENGINE=InnoDB;
The first 2 attributes are FKs to Table A and B respectively. But doesn’t need to be. the FKs can be dropped if it’s more optimal not to have it. FKToTableA and FKToTableB is the candidate key for this table. Thus FKToTableA and FKToTableB could be a composite primary key if needed.
My question is what’s the most optimal way of indexing this table? Storage size and insert times are not a concern.
Most of the time my queries will be joining on to MyLookup on both FK columns and SUM the Count, Bytes and Packets columns.
select a.something, b.something, SUM(c.Count), SUM(c.Bytes), SUM(b.Packets) from A a inner join MyLookup c on a.Id = c.FKToTableA inner join B b on b.Id = c.FKToTableB where a.something = 1 and a.Time >= 'blah' and a.Time <= 'blah' group by a.something, b.something
I see three options.
1) Put a HASH index on FKToTableA and FKToTableB.
2) Put a composite BTREE index on all columns.
3) Put a Primary Key on FKToTableA and FKToTableB.
I’m kind of leaning towards 1. From what I gather HASH indexes are great at equality comparisons and that’s all joins are is a big equality comparison is it not? Still has to go hit the table again to sum the other columns though which may be a bottle neck. Not sure.
Or maybe it’s possible to have a hash index with Count, Bytes, Packets? Not sure how composite hash indexes works though.
Has anyone dealt with this type of thing before and can shed some knowledge and suggestions?
Here’s my schema for Table A and B to make it a bit more clear
CREATE TABLE `B` ( `Id` bigint(20) NOT NULL AUTO_INCREMENT, `CustomerIdent` int(11) NOT NULL, `MetaData1` varchar(256) NULL, `MetaData2` varchar(256) NULL, `MetaData3` varchar(256) NULL, `MetaData4` varchar(256) NULL, `MetaData5` varchar(256) NULL, `Time` datetime NOT NULL CONSTRAINT `PK_A` PRIMARY KEY (`Id` ASC) ) ENGINE=InnoDB; CREATE TABLE `B` ( `Id` bigint(20) NOT NULL AUTO_INCREMENT, `SourceIp` bigint(20) NULL, `DestinationIp` bigint(20) NULL, `SourcePort` int(11) NULL, `DestinationPort` int(11) NULL, CONSTRAINT `PK_B` PRIMARY KEY (`Id` ASC) ) ENGINE=InnoDB;
Basically the lookup table is in charge of telling me how many times an row in B happened for one row of A.
a: Satisfy the
MyLookup— If the pair
(FKToTableA, FKToTableB)is unique, then make that the
PRIMARY KEY, and put the columns in that order so that your
SELECTcan quickly get into
- Don’t use
BIGINT(8 bytes) unless you expect to exceed 4 billion, the limit for
INT UNSIGNED, which takes only 4 bytes.
- IP addresses — for the old IPv4, there are convenient routines for converting to
INT UNSIGNED. For the new IPv6, it won’t fit into
BIGINT. See 5.6.3.
- Do those Metadata columns need utf8? Can they be combined into a TEXT column? And other questions.
- Build and maintain a “Summary table” rather than scanning large chunks of the “Fact” table to get this “report”.
- Do the
- Do you really need a many-to-many mapping table? Seems like this is 1:many.
- More tips on many-to-many.
This design looks like that you stored log data into MySQL, and you want to do something analytic on it.
I would suggest you to look at Cassandra (or ScyllaDB, a C++ re-written compatible solution to Cassandra) + Presto, they are all open source softwares, and they can process your SQL query parallelly and effectively.
Especially because Presto’s SQL language is very similiar to MySQL (because it’s developed by Facebook in the beginning), it should be very easy to learn.
Anyway, if you must use InnoDB, there are serveral optimizations you should do for this SQL query:
GROUP BYpart, and change
1, because it will always be
1due to your
Ashould create index on
(something, Time, ...)because your
WHEREcondition. If using MyISAM, choose BTREE due to its range search condition.
- However, I would suggest you to change from MyISAM to InnoDB, becase InnoDB is a crash-safe engine and Oracle is still putting a lot of effort to improve it.
Also, If you can modify your application, then you can do more optimizations:
- Select all possible
b.somethingin the given time, then run all SQL queries in the same time. It will use your CPU resource much efficiently.