Improve performance by combining two quieries

Posted on

Question :

I have following submission table. A submission is an image.

create table submission
(
    md5 varchar(255) not null
        constraint submission_pkey
            primary key,
    user_name varchar(255)
        constraint fkmc6ld10rt5dgfsypoanfxifaj
            references users,
    copy_of varchar(255),
    create_date timestamp
);

copy_of is the submission_pkey (md5) of another user who has posted the same submission (image).
In other words if two users post the same image the later submission will contain an copy_of to the submission_pkey of the first image.

Now I am looking for a query which given one user_name will return a set of other usernames who have posted the same submission.
In this list I also want the usernames which the current user_name has copied the submissions from.

For this I have created two queries which together return the desired result set:

First:

select (select name
        from users
                 inner join submission s2 on users.name = s2.user_name
        where s2.md5 = s.copy_of) as other
from users
         inner join submission s on users.name = s.user_name
where copy_of is not null
  and name = ?1
group by name, other;

Second:

select (select user_name from submission where copy_of = s.md5 limit 1)
from users
         inner join submission s on users.name = s.user_name
where name = ?1
  and md5 in (select copy_of from submission s2);

Both (especially the second) queries run quite slow. Is there a way to improve those queries and ideally combine them to return one distinct list of user_name? I feel like the join with the users table should not be needed since all the relevant data is in the submission table but I have not found a way without it.

Answer :

Can it be?

select distinct s2.user_name 
from submission s1
  join submission s2 
    on s1.md5 = s2.copy_of 
    or s1.copy_of = s2.copy_of 
    or s1.copy_of = s2.md5
where s1.user_name = ?1 
 and s2.copy_of is not null;

s1.md5 = s2.copy_of will match submission which is copy from this user.
s1.copy_of = s2.copy_of or s1.copy_of = s2.md5 will match submission which current user copy from.

I would use a self-join

select u1.name, string_agg(s2.user_name, ',') as other users
from users u1
   join submission s1 on u1.name = s1.user_name
   join submission s2 on s1.md5 = s2.copy_of
where name = ?1
group by u1.name;

Leave a Reply

Your email address will not be published. Required fields are marked *