Question :
I had a question in my GATE Mock Test Paper that was as follows:
Which of the following can appear in position marked as [..]?
“Select [..] from R group by A, B”
I. SUM(C+D)
II. A, Min(C)
III. A,B, Min(C)
I answered all of the above, but my friends and answer key stated that only II and III are correct. I ran the query on mysql with no problem, and saw this code listed on few websites as well.
So is this query valid for general SQL, or it is implemented only by certain applications such as MySQL?
Answer :
NOTE: I have no idea what ‘GATE Mock Test Paper’ is, or if that’s supposed to have a bearing on the answer, so fwiw …
As others have pointed out, technically all of the answers could be true … though this will depend on the RDBMS (ie, flavor of SQL) as well as any configs/settings.
If the intention is to pick answers that are ANSI compliant, then I’d say ‘III’ is the correct answer since all non-aggregates should be part of the group by
. Then again, if you start looking at some of the nitty-gritty ANSI details you’ll find this requirement can sometimes be relaxed.
As you’ll find out, most RDBMS products don’t enforce such a strict standard: all non-aggregates being a member of the group by
clause
As Michael Kutz has mentioned, without ‘A’ and/or ‘B’ in the select/projection list, answers ‘I’ and ‘II’ will provide some confusing answers. [And if you ever find yourself working in a real world RDBMS environment you’ll find developers writing these types of queries all the time, but then not being able to describe what it is they’re asking for let alone being able to explain the results they end up with.]
Again, answer ‘III’ is the only one I’d say can be clearly explained … what I’m looking for … the results I end up with.
Answer
All of them are valid. However, I and II are logical nonsense.
Expected Results
For every A,B
combination, you will get 1 row irregardless of what columns/aggregates are listed in the SELECT
clause.
If you have 100 distinct combinations of A,B
, I expect your result to show 100 rows for I
,II
, and III
.
Logic
Answer I
will show a series of values. You will have no idea which values are for which A,B
combination because A
and B
are not part of your SELECT
statement.
Because you don’t have an ORDER BY
clause, you can make no assumptions about which row matches to which combination of A,B
.
Answer II
will show a series of values for each distinct value of A
. You will have no idea which values are for which A,B
combination because B
is not part of your SELECT
statement.
Because you don’t have an ORDER BY
clause, you can make no assumptions about which row matches to which combination of A,B
.
Answer III will show a series of values for each distinct combination of A,B
.
Conclusion
From a syntax stand point, they are all valid.
From a logical stand point, only III is valid.