I have a table that stores details about incoming and outgoing phone calls such as when a call was made, or when a call was received, the phone number of the caller, and the phone number of the receiver.
Before you tell me to separate the data in two tables, one for incoming calls and another for outgoing calls, I tried to do it but my thesis partners didn’t agree so I had to go along with a single table, differentiating them with an “incoming” int column.
We are trying to match each outgoing call to an incoming call. The criteria for that is that the incoming call will have been created within 30 seconds of the outgoing call, and that their numbers match.
select * from tesis.call oc --outgoing calls join tesis.call ic --incoming calls on oc.sourceNumber = ic.targetNumber and ic.dateCreated > oc.dateCreated and ic.dateCreated <= (oc.dateCreated + interval 30 second) where oc.incoming = 0 and ic.incoming = 1 and oc.neighborhood_id is not null
We have an index for “incoming”, an index consisting of (sourceNumber, targetNumber), and for in dateCreated.
tesis.call has around 13000 rows
sourceNumber and targetNumber are varchar(20)
dateCreated is DateTime
incoming is int
For some reason, the query takes around 100 seconds to execute, tested in two computers with the DB’s on localhost, running with an i5 processor, one of them on MySQL and the other on MariaDB.
I tried many index combinations and none of them work.
This is the execution plan with the provided indexes:
Right now the provided query with the specified indexes took: 0,780 sec. (+ 116,455 sec. network) to execute.
The network part looks very weird to me, since “select * from tesis.call” takes 0,000 sec. (+ 0,062 sec. network) to execute so it doesn’t look like a connectivity issue.
EDIT: Here is the result of SHOW CREATE TABLE tesis.call:
CREATE TABLE `call` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `dateCreated` datetime(6) DEFAULT NULL, `incoming` int(11) DEFAULT NULL, `operatorName` varchar(20) DEFAULT NULL, `sourceNumber` varchar(20) DEFAULT NULL, `targetNumber` varchar(20) DEFAULT NULL, `batteryLevel` double DEFAULT NULL, `currentSignal` double DEFAULT NULL, `locationLat` double DEFAULT NULL, `locationLon` double DEFAULT NULL, `dispatchDate` datetime DEFAULT NULL, `insertionDate` datetime DEFAULT NULL, `matchId` int(10) unsigned DEFAULT NULL, `callTime` int(10) DEFAULT '0', `neighborhood_id` int(11) DEFAULT NULL, `Geom` point DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `id_UNIQUE` (`id`), UNIQUE KEY `matchId_UNIQUE` (`matchId`), KEY `Geom` (`Geom`(25)), KEY `incoming` (`incoming`) USING HASH, KEY `dateCreated` (`dateCreated`), KEY `sourceNumber_targetNumber` (`sourceNumber`,`targetNumber`), KEY `dateCreated_incoming_targetNumber` (`incoming`,`targetNumber`,`dateCreated`), KEY `dateCreated_incoming_sourceNumber` (`incoming`,`sourceNumber`,`dateCreated`) ) ENGINE=InnoDB AUTO_INCREMENT=15154 DEFAULT CHARSET=utf8
EDIT 2: I uploaded a script to recreate the table in case anyone wants to do some tests.
My bet is on creating the following indexes:
(incoming, targetNumber, dateCreated) (sourceNumber, incoming, dateCreated)
But you have to rewrite your query to do:
select * from tesis.call oc --outgoing calls join tesis.call ic on oc.sourceNumber = ic.targetNumber and ic.dateCreated > oc.dateCreated and (ic.dateCreated - interval 30 second) <= oc.dateCreated --incoming calls where oc.incoming = 0 and ic.incoming = 1 and oc.neighborhood_id is not null
This seems better than doing the JOIN in this direction oc -> ic, where the indexes would be:
Please note that this is long shot, as I do not know the cardinality of the conditions, I can only imagine them. Please test it and show the results of
EXPLAIN to check it.
Firstly, I’m giving a +1 for the script – it’s a pity that more people don’t put sample data up – even if only on SQLFiddle and SQLFiddle is limited in size though! It’s good to work with real data!
Now, to the issue. I ran your database create script and I did the following.
mysql> select count(*) from `call`; +----------+ | count(*) | +----------+ | 14117 | +----------+ 1 row in set (0.01 sec)
I then ran your original query (I seem to have to put in ticks – a Linux thing, you’re on Windows AIUI).
select * from `call` oc join `call` ic on oc.sourceNumber = ic.targetNumber and ic.dateCreated > oc.dateCreated and ic.dateCreated <= (oc.dateCreated + interval 30 second) where oc.incoming = 0 and ic.incoming = 1 and oc.neighborhood_id is not null
6197 rows in set (29.54 sec)
Is this correct no. of rows (see below)? I also appear to be getting round the same times as you are.
I then added “ORDER BY oc.id” to the original query.
But the system stalls – processlist shows that the query has finished, but my mysql client appears to freeze and doesn’t return after 5 mins… This happens on both MySQL 5.7 and Percona Server 5.6. What happens when you add this ORDER BY?
So, what I did then was to create two separate tables icall and ocall.
CREATE TABLE icall AS SELECT * FROM `call` WHERE incoming = 1; CREATE TABLE ocall AS SELECT * FROM `call` WHERE incoming = 0;
That partitions the
call table – there are no nulls in incoming.
and I ran the following query.
select * from ocall o join icall i on o.sourceNumber = i.targetNumber and i.dateCreated > o.dateCreated and i.dateCreated <= o.dateCreated + interval 30 second where o.neighborhood_id is null;
BUT, my count is very different (as is the time).
416 rows in set (2.24 sec)
Now, by my reckoning (unless I’ve been very very stupid here!), the two result sets should be equivalent – I just split the data into two tables… Am I wrong – if so, where is the flaw in my logic?
To see if I could improve this time further, I added indexes on the joining fields. Remember that a CREATE TABLE AS statement doesn’t copy indexes.
CREATE INDEX targetNumber_ix ON icall (targetNumber); CREATE INDEX sourceNumber_ix ON ocall (sourceNumber); CREATE INDEX dateCreated_ix ON icall (dateCreated); CREATE INDEX dateCreated_ix ON ocall (dateCreated);
and the time JUMPED to 7.5/8 seconds and went back down when I removed them? WTF? If any of the MySQL gurus could jump in here and explain to me what on earth is going on, I’d be very grateful.