Selecting the most specific match from one table

Posted on

Question :

So let’s say there is a simple table for dogs:

CREATE TABLE IF NOT EXISTS dogs (
  id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  breed char(20) NOT NULL,
  color char(20) NOT NULL,
  special_skill char(20) DEFAULT NULL
);

The real example of this table would have foreign key references for breed and color to an enum table and the dogs table would be pre-populated with values that match the cartesian products of breed, color.
So there is the guarantee that there will always be a record for any combination of breed and color.

And some simple sample data:

1, 'American Bulldog',  'White', 'Runs fast'
2, 'American Bulldog',  'White',  Null
3, 'American Foxhound', 'White',  Null
4, 'American Foxhound', 'Black',  Null
5, 'American Bulldog',  'Black',  Null

How would one go about building a query that would find the most specific match possible by provided query filters?

Say a person wants to find an American Bulldog that is White and Runs fast. They will get a record with Id = 1 from our current table. Another person looking for the same kind of dog but with a ‘Black’ color instead would find a record with Id = 5 even tough the person also specified to find a dog that Runs fast.

So in essence here would be some pseudo filters and results:

find by ('American Bulldog',  'White', 'Runs fast')
Returns id 1

find by ('American Bulldog',  'White')
Returns id 2

find by ('American Bulldog',  'Black', 'Runs fast')
Returns id 5

During my quest to find a solution to this I was only able to come up with some very hackish queries with a case statement in it and explicitly ordering results in certain ways to be able to pick the top most result. But I am curious if there is a simple version (uses basic SQL features not some case) possible for this that could be done using some DSL in a programming language and not falling back to plain SQL queries.

Answer :

SELECT *
FROM dogs
ORDER BY CASE WHEN breed <=> @breed THEN @breed_weight ELSE 0 END
        +CASE WHEN color <=> @color THEN @color_weight ELSE 0 END
        +CASE WHEN skill <=> @skill THEN @skill_weight ELSE 0 END DESC LIMIT 1

The parameter weighting factors can be constant literals or varying query parameters.

find by ('American Bulldog',  'Black', 'Runs fast')
Returns id 5

@color_weight must be greater than @skill_weight. Backward record id=1 will be returned, and a random of them if weights are equal.

If some search condition is mandatory – move the comparison to WHERE clause and remove it from ORDER BY expression.

Yeah the breed and color can basically be in the where clause, they will be mandatory.

SELECT *
FROM dogs
WHERE breed = @breed
  AND color = @color
ORDER BY (skill <=> @skill) DESC LIMIT 1

Leave a Reply

Your email address will not be published. Required fields are marked *