I need help optimizing a not that complicated query that returns unexpedcted performance problems

Posted on

Question :

I have a table that stores details about incoming and outgoing phone calls such as when a call was made, or when a call was received, the phone number of the caller, and the phone number of the receiver.
Before you tell me to separate the data in two tables, one for incoming calls and another for outgoing calls, I tried to do it but my thesis partners didn’t agree so I had to go along with a single table, differentiating them with an “incoming” int column.

We are trying to match each outgoing call to an incoming call. The criteria for that is that the incoming call will have been created within 30 seconds of the outgoing call, and that their numbers match.

select 
  *
from 
  tesis.call oc --outgoing calls
join tesis.call ic --incoming calls
  on oc.sourceNumber = ic.targetNumber 
  and ic.dateCreated > oc.dateCreated 
  and ic.dateCreated <= (oc.dateCreated + interval 30 second) 
where 
  oc.incoming = 0 
  and ic.incoming = 1 
  and oc.neighborhood_id is not null

We have an index for “incoming”, an index consisting of (sourceNumber, targetNumber), and for in dateCreated.
tesis.call has around 13000 rows
sourceNumber and targetNumber are varchar(20)
dateCreated is DateTime
incoming is int

For some reason, the query takes around 100 seconds to execute, tested in two computers with the DB’s on localhost, running with an i5 processor, one of them on MySQL and the other on MariaDB.
I tried many index combinations and none of them work.

This is the execution plan with the provided indexes:
Execution plan

Thank you.

Right now the provided query with the specified indexes took: 0,780 sec. (+ 116,455 sec. network) to execute.
The network part looks very weird to me, since “select * from tesis.call” takes 0,000 sec. (+ 0,062 sec. network) to execute so it doesn’t look like a connectivity issue.

EDIT: Here is the result of SHOW CREATE TABLE tesis.call:

CREATE TABLE `call` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `dateCreated` datetime(6) DEFAULT NULL,
  `incoming` int(11) DEFAULT NULL,
  `operatorName` varchar(20) DEFAULT NULL,
  `sourceNumber` varchar(20) DEFAULT NULL,
  `targetNumber` varchar(20) DEFAULT NULL,
  `batteryLevel` double DEFAULT NULL,
  `currentSignal` double DEFAULT NULL,
  `locationLat` double DEFAULT NULL,
  `locationLon` double DEFAULT NULL,
  `dispatchDate` datetime DEFAULT NULL,
  `insertionDate` datetime DEFAULT NULL,
  `matchId` int(10) unsigned DEFAULT NULL,
  `callTime` int(10) DEFAULT '0',
  `neighborhood_id` int(11) DEFAULT NULL,
  `Geom` point DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id_UNIQUE` (`id`),
  UNIQUE KEY `matchId_UNIQUE` (`matchId`),
  KEY `Geom` (`Geom`(25)),
  KEY `incoming` (`incoming`) USING HASH,
  KEY `dateCreated` (`dateCreated`),
  KEY `sourceNumber_targetNumber` (`sourceNumber`,`targetNumber`),
  KEY `dateCreated_incoming_targetNumber` (`incoming`,`targetNumber`,`dateCreated`),
  KEY `dateCreated_incoming_sourceNumber` (`incoming`,`sourceNumber`,`dateCreated`)
) ENGINE=InnoDB AUTO_INCREMENT=15154 DEFAULT CHARSET=utf8

EDIT 2: I uploaded a script to recreate the table in case anyone wants to do some tests.
https://www.mediafire.com/?ph0gk06dxntohmg

Answer :

My bet is on creating the following indexes:

(incoming, targetNumber, dateCreated)
(sourceNumber, incoming, dateCreated)

But you have to rewrite your query to do:

select *
from tesis.call oc --outgoing calls
join tesis.call ic on 
    oc.sourceNumber = ic.targetNumber 
    and ic.dateCreated > oc.dateCreated 
    and (ic.dateCreated - interval 30 second) <= oc.dateCreated --incoming calls
where oc.incoming = 0 
    and ic.incoming = 1 
    and oc.neighborhood_id is not null

This seems better than doing the JOIN in this direction oc -> ic, where the indexes would be:
(incoming)
(targetNumber, dateCreated)

Please note that this is long shot, as I do not know the cardinality of the conditions, I can only imagine them. Please test it and show the results of EXPLAIN to check it.

Firstly, I’m giving a +1 for the script – it’s a pity that more people don’t put sample data up – even if only on SQLFiddle and SQLFiddle is limited in size though! It’s good to work with real data!

Now, to the issue. I ran your database create script and I did the following.

mysql> select count(*) from `call`;
+----------+
| count(*) |
+----------+
|    14117 |
+----------+
1 row in set (0.01 sec)

I then ran your original query (I seem to have to put in ticks – a Linux thing, you’re on Windows AIUI).

select 
  *
from 
  `call` oc
  join `call` ic 
  on oc.sourceNumber = ic.targetNumber 
  and ic.dateCreated > oc.dateCreated 
  and ic.dateCreated <= (oc.dateCreated + interval 30 second) 
where 
  oc.incoming = 0 
  and ic.incoming = 1 
  and oc.neighborhood_id is not null

and obtained

6197 rows in set (29.54 sec)

Is this correct no. of rows (see below)? I also appear to be getting round the same times as you are.

I then added “ORDER BY oc.id” to the original query.

But the system stalls – processlist shows that the query has finished, but my mysql client appears to freeze and doesn’t return after 5 mins… This happens on both MySQL 5.7 and Percona Server 5.6. What happens when you add this ORDER BY?

So, what I did then was to create two separate tables icall and ocall.

CREATE TABLE icall AS SELECT * FROM `call` WHERE incoming = 1;
CREATE TABLE ocall AS SELECT * FROM `call` WHERE incoming = 0;

That partitions the call table – there are no nulls in incoming.

and I ran the following query.

select * from ocall o
join icall i
on o.sourceNumber = i.targetNumber
and i.dateCreated > o.dateCreated
and i.dateCreated <= o.dateCreated + interval 30 second
where o.neighborhood_id is null;

BUT, my count is very different (as is the time).

416 rows in set (2.24 sec)

Now, by my reckoning (unless I’ve been very very stupid here!), the two result sets should be equivalent – I just split the data into two tables… Am I wrong – if so, where is the flaw in my logic?

To see if I could improve this time further, I added indexes on the joining fields. Remember that a CREATE TABLE AS statement doesn’t copy indexes.

CREATE INDEX targetNumber_ix ON icall (targetNumber);
CREATE INDEX sourceNumber_ix ON ocall (sourceNumber);
CREATE INDEX dateCreated_ix ON icall (dateCreated);
CREATE INDEX dateCreated_ix ON ocall (dateCreated);

and the time JUMPED to 7.5/8 seconds and went back down when I removed them? WTF? If any of the MySQL gurus could jump in here and explain to me what on earth is going on, I’d be very grateful.

Leave a Reply

Your email address will not be published. Required fields are marked *