What is the name of this type of query, and what is an efficient example?

Posted on

Question :

The purpose is to find a parent, given it’s children. For example, say you have a marketing package (aka a “combo”), and want to match it based on the products in it. Example table/data:

create table marketing_package_product (
  package_id int not null references marketing_package(id),
  product_id int not null references product(id),
  primary key (package_id, product_id)
);

insert into marketing_package_product values 
(1,1),
(1,2),
(1,3),
(2,1),
(2,5);

Given products 1,2,3, I want to get marketing_package 1. But given products 1,2 only I do not want marketing_package 1.

Is there a name for this type of query, and what is the most efficient way to go about it?

Answer :

The problem you are solving is called Relational Division with two variations, the “Division with Remainder” and “Exact division” (which fits your description).

See also this article: Divided We Stand: The SQL of Relational Division

This question at StackOVerflow: How to filter SQL results in a has-many-through relation has a few ways to solve it and benchmarks for Postgres but it’s not for the exact variation, only for the “Division with Remainder”. The queries will be similar but an extra check/condition has to be added for the exact variation.

One way (of the too many) to solve this would be to self-join the table as many times as the “products” and then an extra check so only packages with these exactly products are kept):

SELECT m1.package_id 
FROM marketing_package_product AS m1
  JOIN marketing_package_product AS m2
    ON m2.package_id = m1.package_id 
WHERE m1.product_id = 1
  AND m2.product_id = 2
  AND NOT EXISTS
      ( SELECT *
        FROM marketing_package_product AS x
        WHERE x.package_id = m1.package_id 
          AND x.product_id NOT IN (1,2)
      ) ;

Another would be to GROUP BY and then two HAVING conditions, one for the division and the other for the “exactness”:

SELECT m.package_id 
FROM marketing_package_product AS m
GROUP BY m.package_id
HAVING COUNT(CASE WHEN m.product_id IN (1,2) THEN 1 END) = @n 
   AND COUNT(*) = @n ;

where @n would be 2 in this case, the number of products in your check.

Test at SQL-Fiddle.

Leave a Reply

Your email address will not be published. Required fields are marked *