Question :
I have the following query on msaccess
SELECT Trim([T13_RefSupplier_France_List].[Supplier_code]) AS Supplier_code,
Trim([T13_RefSupplier_France_List].[Art]) AS Internal_reference,
Trim([fp_rcli]) AS Supplier_reference,
Last(Trim([ar_fami])) AS Family_code,
Last(Trim([fp_upri])) AS Purchasing_unit,
Last(Nz([fp_pcde],0)) AS Purchasing_price,
Last(Supplier_currency_France.Recode) AS Purchasing_currency,
Last("") AS Consigned,
Last(0) AS Eco_order_qty,
Last(CDbl(Nz(Nz([fp_cond],[ar_qcdi]),0))) AS Pack_order_qty,
Last(IIf([fp_minc]=0,Nz([fp_cond],[ar_qcdi]),[fp_minc])) AS Min_order_qty,
0 AS Min_order_value,
0 AS Product_grossweight,
0 AS Product_grosscube,
Last(gpfprodu_France.fp_dela) AS Leadtime_days,
Nz([Localisation_France].[Site],"Poitiers") AS Site,
CDbl(Nz([Active],-1)) AS Supplier_active,
Max(IIf([ar1_pdanz]=0,NULL,[ar1_pdanz])) AS Ref_price
FROM ((T13_RefSupplier_France_List
LEFT JOIN ((gparticl_France
LEFT JOIN Localisation_France ON gparticl_France.ar_loco=Localisation_France.Localisation)
LEFT JOIN [*gpartic1_France] ON gparticl_France.ar_code=[*gpartic1_France].ar1_code) ON T13_RefSupplier_France_List.Art=gparticl_France.ar_code)
LEFT JOIN ((gpfprodu_France
LEFT JOIN gpfourni_France ON gpfprodu_France.fp_four=gpfourni_France.fo_code)
LEFT JOIN Supplier_currency_France ON gpfourni_France.fo_monn=Supplier_currency_France.fo_monn) ON (T13_RefSupplier_France_List.Supplier_code=gpfprodu_France.fp_four)
AND (T13_RefSupplier_France_List.Art=gpfprodu_France.fp_arti))
LEFT JOIN T13_RefSupplier_France_SupplierActive ON (T13_RefSupplier_France_List.Art=T13_RefSupplier_France_SupplierActive.Art)
AND (T13_RefSupplier_France_List.Supplier_code=T13_RefSupplier_France_SupplierActive.Supplier_code)
GROUP BY Trim([T13_RefSupplier_France_List].[Supplier_code]),
Trim([T13_RefSupplier_France_List].[Art]),
Trim([fp_rcli]),
Nz([Localisation_France].[Site],"Poitiers"),
CDbl(Nz([Active],-1))
HAVING (((Trim(T13_RefSupplier_France_List.Supplier_code))<>"FG0002")
AND ((Trim(T13_RefSupplier_France_List.Art))<>"A60101000000000"));
I can do a GROUP BY
without using the column that have the LAST
function but when I remove the LAST function, I have to add all columns on the GROUP BY
(see below).
SELECT Trim([T13_RefSupplier_France_List].[Supplier_code]) AS Supplier_code,
Trim([T13_RefSupplier_France_List].[Art]) AS Internal_reference,
Trim([fp_rcli]) AS Supplier_reference,
Trim([ar_fami]) AS Family_code,
Trim([fp_upri]) AS Purchasing_unit,
Nz([fp_pcde],0) AS Purchasing_price,
Supplier_currency_France.Recode AS Purchasing_currency,
"" AS Consigned,
0 AS Eco_order_qty,
CDbl(Nz(Nz([fp_cond],[ar_qcdi]),0)) AS Pack_order_qty,
IIf([fp_minc]=0,Nz([fp_cond],[ar_qcdi]),[fp_minc]) AS Min_order_qty,
0 AS Min_order_value,
0 AS Product_grossweight,
0 AS Product_grosscube,
gpfprodu_France.fp_dela AS Leadtime_days,
Nz([Localisation_France].[Site],"Poitiers") AS Site,
CDbl(Nz([Active],-1)) AS Supplier_active,
Max(IIf([ar1_pdanz]=0,NULL,[ar1_pdanz])) AS Ref_price
FROM ((T13_RefSupplier_France_List
LEFT JOIN ((gparticl_France
LEFT JOIN Localisation_France ON gparticl_France.ar_loco=Localisation_France.Localisation)
LEFT JOIN [*gpartic1_France] ON gparticl_France.ar_code=[*gpartic1_France].ar1_code) ON T13_RefSupplier_France_List.Art=gparticl_France.ar_code)
LEFT JOIN ((gpfprodu_France
LEFT JOIN gpfourni_France ON gpfprodu_France.fp_four=gpfourni_France.fo_code)
LEFT JOIN Supplier_currency_France ON gpfourni_France.fo_monn=Supplier_currency_France.fo_monn) ON (T13_RefSupplier_France_List.Supplier_code=gpfprodu_France.fp_four)
AND (T13_RefSupplier_France_List.Art=gpfprodu_France.fp_arti))
LEFT JOIN T13_RefSupplier_France_SupplierActive ON (T13_RefSupplier_France_List.Art=T13_RefSupplier_France_SupplierActive.Art)
AND (T13_RefSupplier_France_List.Supplier_code=T13_RefSupplier_France_SupplierActive.Supplier_code)
GROUP BY Trim([T13_RefSupplier_France_List].[Supplier_code]),
Trim([T13_RefSupplier_France_List].[Art]),
Trim([fp_rcli]),
Nz([Localisation_France].[Site],"Poitiers"),
Trim([ar_fami]),
Trim([fp_upri]),
Nz([fp_pcde],0),
Supplier_currency_France.Recode,
"",
0,
CDbl(Nz(Nz([fp_cond],[ar_qcdi]),0)),
IIf([fp_minc]=0,Nz([fp_cond],[ar_qcdi]),[fp_minc]),
0,
0,
0,
gpfprodu_France.fp_dela,
CDbl(Nz([Active],-1))
HAVING (((Trim(T13_RefSupplier_France_List.Supplier_code))<>"FG0002")
AND ((Trim(T13_RefSupplier_France_List.Art))<>"A60101000000000"));
Questions are two fold:
- What is the behaviour of the
LAST
function? - Let’s say I want to transpose the
LAST
function to SQLServer: What should I do?
Thanks
Answer :
SUMMARY
- With
GROUP BY
, you can only include the columns in theGROUP BY
clause, constants, and aggregated values. - SQL Server has a similar window function,
LAST_VALUE()
, but it only exists in SQL Server 2012 or later.
DETAILED EXPLANATION
LAST()
is what’s called an aggregate function. Aggregate functions calculate a value for all rows in a query (if there’s no GROUP BY
, or for all rows with the same values for the columns in the GROUP BY
clause. In SQL Server (and, I’ll assume, in Access), if you use an aggregate function in your SELECT
list (the list of data columns to return from the query) in a query without a GROUP BY
clause, then everything being returned must be:
- Values that don’t rely on your table’s data:
- constant values like
Consigned
orMin_order_value
in your query - the results of functions that don’t operate on the table data, like
NOW()
in Access, orGETDATE()
in SQL Server
- constant values like
- The results of aggregate functions (usually operating on columns from the tables in the query), like the
LAST()
functions from your first query, orMax(IIf([ar1_pdanz]=0,NULL,[ar1_pdanz]))
in both queries.
If you have a GROUP BY
clause, then there’s one addition to the above: the SELECT
list can also include the columns (or calculations) listed in the GROUP BY
clause. Generally, they have to appear exactly the same in both places (for example, if you GROUP BY SUBSTRING(field1,1,20)
, you can’t just have field1
in the SELECT
list).
When you put Last(Trim([ar_fami]))
in your SELECT
list, you’re basically telling Access: “There are going to be multiple values for ar_fami
in my group, but I don’t really care what those values are; just give me the value from the last row you happen to see.”
If you include Trim([ar_fami])
in the SELECT
list and in the GROUP BY
, you’re telling Access that you want to see each unique value in your output, and the subtotals/averages/whatever for those values, along with the rest of the columns in the GROUP BY
.
When you try to include Trim([ar_fami])
in the SELECT
list, but not in the GROUP BY
, then Access is confused: you may have multiple values for each possible combination of the GROUP BY
columns, and it has no way to tell which one of those values you would want to see.
Note for us humans: in practice, it’s possible that for all the rows in each of your groups, in reality all the values are exactly the same. However, databases generally need to decide how they’re going to get data before they actually start to get it; and, at that point, it has no way to know if all of the ar_fami
values in each group will be the same. So, it has to know how to handle different values in advance.
By the way – you should be able to take the constants out of the GROUP BY
clause in your second query (the ""
and the multiple 0
s). Since they are always the same, they can’t change the grouping of the rows.
As far as your second question, as of SQL Server 2012, you’ve got access to a similar function there. The name is a little different: LAST_VALUE()
instead of LAST()
. Also, LAST_VALUE()
is a “window function”; it requires an OVER()
clause, defining the groups is works with and the order it will use, and the results are calculated for each row in the result set, not for an entire group.
You would have to create a convoluted query (or, more likely, set of nested queries) in order to capture non-grouped values from the last record seen in each group.
In practice, these values would seem to be of limited utility. If you’re aggregating total sales by the salespeople in four different locations, does it really matter that the name of the last salesperson was “Tony Warner”? Especially since it’s possible that you’d run the same query two days later, with the same underlying data, yielding the same group results for totals and subtotals – but noting the name of the last salesperson was “Frank Northrup”? In SQL Server at least, SQL will sort out the data into the various groups to get the aggregated values, and will apply whatever ORDER BY
you’ve specified; but it’s up to you what that order by is. If the ORDER BY
doesn’t include the column you’re getting the last value of, or something that’s tied to a specific value, the “last” row may be different each time you call the function, even over the same set of data.
Still, obviously, some people find it worth using. I may suspect those people don’t really grasp what they’re getting back, or (worse) what people looking at their results will assume, but maybe there’s some definite worth to command that I’m simply not seeing. For me, if the results are basically random, I don’t see the worth.
What is the behaviour of the
LAST
function?
From the documentation:
These functions return the value of a specified field in the first or last record, respectively, of the result set returned by a query. If the query does not include an ORDER BY clause, the values returned by these functions will be arbitrary because records are usually returned in no particular order.
This does not say anything about the behaviour of LAST
when a GROUP BY
clause is present, but from testing it appears that FIRST
and LAST
return values from the row encountered first or last within each group.
Without an ORDER BY
clause the row chosen (per group) by FIRST
and LAST
is essentially arbitrary. The important point is that the values chosen by multiple FIRST
and LAST
functions will come from the same row.
This last point means you cannot just replace FIRST
or LAST
by MIN
or MAX
(aside from the different semantics) because the minimum and maximum will generally not be from the same source row.
Let’s say I want to transpose the
LAST
function to SQL Server: What should I do?
It is essentially impossible to duplicate this exactly, since the Access behaviour is not precisely defined; it’s not possible to predict which row will be chosen as FIRST
or LAST
in all but the simplest cases.
That said, if you can improve the query semantic to have a deterministic choice for FIRST
or LAST
row within each group, a general translation would be to number each row (ascending or descending per group), then choose values from the row numbered 1.
The row numbering can be done with ROW_NUMBER
. Within the OVER
clause, the GROUP BY
columns go in the PARTITION BY
section, with deterministic ordering provided in the ORDER BY
section. You will need to write a subquery or use a Common Table Expression (CTE) to filter row number to 1.
For example:
DECLARE @Table1 table
(
ID integer IDENTITY NOT NULL,
GroupID integer NOT NULL,
[Data] integer NOT NULL
);
INSERT @Table1
(GroupID, [Data])
VALUES
(1, 3),
(1, 2),
(1, 1),
(2, 6),
(2, 5),
(2, 4);
-- FIRST (ordered by ID ASC within GroupID)
WITH Numbered AS
(
SELECT
T.GroupID,
T.[Data],
rn = ROW_NUMBER() OVER (
PARTITION BY T.GroupID
ORDER BY T.ID ASC)
FROM @Table1 AS T
)
SELECT
N.GroupID,
N.[Data]
FROM Numbered AS N
WHERE
N.rn = 1;
-- LAST (ordered by ID DESC within GroupID)
WITH Numbered AS
(
SELECT
T.GroupID,
T.[Data],
rn = ROW_NUMBER() OVER (
PARTITION BY T.GroupID
ORDER BY T.ID DESC)
FROM @Table1 AS T
)
SELECT
N.GroupID,
N.[Data]
FROM Numbered AS N
WHERE
N.rn = 1;
Demo: db<>fiddle
In SQL Server 2012 or later, this can also be done with the FIRST_VALUE
and LAST_VALUE
windowed functions, but the execution plan may be less efficient. Also, these windowed functions are not aggregates, so you need to write the expression so that it returns the same value for every row per group, then apply an arbitrary aggregate. For example (using non-deterministic ordering just for variety):
DECLARE @Table1 table
(
ID integer IDENTITY NOT NULL,
GroupID integer NOT NULL,
[Data] integer NOT NULL
);
INSERT @Table1
(GroupID, [Data])
VALUES
(1, 3),
(1, 2),
(1, 1),
(2, 6),
(2, 5),
(2, 4);
WITH Windowed AS
(
SELECT
T.GroupID,
fv = FIRST_VALUE(T.[Data]) OVER (
PARTITION BY T.GroupID
ORDER BY T.GroupID
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
FROM @Table1 AS T
)
SELECT
W.GroupID,
[Data] = MIN(W.fv) -- arbitrary aggregate
FROM Windowed AS W
GROUP BY
W.GroupID;
WITH Windowed AS
(
SELECT
T.GroupID,
lv = LAST_VALUE(T.[Data]) OVER (
PARTITION BY T.GroupID
ORDER BY T.GroupID
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
FROM @Table1 AS T
)
SELECT
W.GroupID,
[Data] = MAX(W.lv) -- arbitrary aggregate
FROM Windowed AS W
GROUP BY
W.GroupID;
Demo: db<>fiddle
Your third choice is to write a SQLCLR user-defined aggregate (UDA). It is not currently possible to guarantee deterministic ordering with these, but the implementation might more closely match what Access does. You would need to be careful that all UDA results were computed by the same operator, to ensure the results of multiple UDA calls all come from the same source row.