why the last function in msaccess allow me to bypass the group by?

Posted on

Question :

I have the following query on msaccess

SELECT Trim([T13_RefSupplier_France_List].[Supplier_code]) AS Supplier_code,
       Trim([T13_RefSupplier_France_List].[Art]) AS Internal_reference,
       Trim([fp_rcli]) AS Supplier_reference,
       Last(Trim([ar_fami])) AS Family_code,
       Last(Trim([fp_upri])) AS Purchasing_unit,
       Last(Nz([fp_pcde],0)) AS Purchasing_price,
       Last(Supplier_currency_France.Recode) AS Purchasing_currency,
       Last("") AS Consigned,
       Last(0) AS Eco_order_qty,
       Last(CDbl(Nz(Nz([fp_cond],[ar_qcdi]),0))) AS Pack_order_qty,
       Last(IIf([fp_minc]=0,Nz([fp_cond],[ar_qcdi]),[fp_minc])) AS Min_order_qty,
       0 AS Min_order_value,
       0 AS Product_grossweight,
       0 AS Product_grosscube,
       Last(gpfprodu_France.fp_dela) AS Leadtime_days,
       Nz([Localisation_France].[Site],"Poitiers") AS Site,
       CDbl(Nz([Active],-1)) AS Supplier_active,
       Max(IIf([ar1_pdanz]=0,NULL,[ar1_pdanz])) AS Ref_price
FROM ((T13_RefSupplier_France_List
       LEFT JOIN ((gparticl_France
                   LEFT JOIN Localisation_France ON gparticl_France.ar_loco=Localisation_France.Localisation)
                  LEFT JOIN [*gpartic1_France] ON gparticl_France.ar_code=[*gpartic1_France].ar1_code) ON T13_RefSupplier_France_List.Art=gparticl_France.ar_code)
      LEFT JOIN ((gpfprodu_France
                  LEFT JOIN gpfourni_France ON gpfprodu_France.fp_four=gpfourni_France.fo_code)
                 LEFT JOIN Supplier_currency_France ON gpfourni_France.fo_monn=Supplier_currency_France.fo_monn) ON (T13_RefSupplier_France_List.Supplier_code=gpfprodu_France.fp_four)
      AND (T13_RefSupplier_France_List.Art=gpfprodu_France.fp_arti))
LEFT JOIN T13_RefSupplier_France_SupplierActive ON (T13_RefSupplier_France_List.Art=T13_RefSupplier_France_SupplierActive.Art)
AND (T13_RefSupplier_France_List.Supplier_code=T13_RefSupplier_France_SupplierActive.Supplier_code)
GROUP BY Trim([T13_RefSupplier_France_List].[Supplier_code]),
         Trim([T13_RefSupplier_France_List].[Art]),
         Trim([fp_rcli]),
         Nz([Localisation_France].[Site],"Poitiers"),
         CDbl(Nz([Active],-1))
HAVING (((Trim(T13_RefSupplier_France_List.Supplier_code))<>"FG0002")
        AND ((Trim(T13_RefSupplier_France_List.Art))<>"A60101000000000"));

I can do a GROUP BY without using the column that have the LASTfunction but when I remove the LAST function, I have to add all columns on the GROUP BY (see below).

SELECT Trim([T13_RefSupplier_France_List].[Supplier_code]) AS Supplier_code,
       Trim([T13_RefSupplier_France_List].[Art]) AS Internal_reference,
       Trim([fp_rcli]) AS Supplier_reference,
       Trim([ar_fami]) AS Family_code,
       Trim([fp_upri]) AS Purchasing_unit,
       Nz([fp_pcde],0) AS Purchasing_price,
       Supplier_currency_France.Recode AS Purchasing_currency,
       "" AS Consigned,
       0 AS Eco_order_qty,
       CDbl(Nz(Nz([fp_cond],[ar_qcdi]),0)) AS Pack_order_qty,
       IIf([fp_minc]=0,Nz([fp_cond],[ar_qcdi]),[fp_minc]) AS Min_order_qty,
       0 AS Min_order_value,
       0 AS Product_grossweight,
       0 AS Product_grosscube,
       gpfprodu_France.fp_dela AS Leadtime_days,
       Nz([Localisation_France].[Site],"Poitiers") AS Site,
       CDbl(Nz([Active],-1)) AS Supplier_active,
       Max(IIf([ar1_pdanz]=0,NULL,[ar1_pdanz])) AS Ref_price
FROM ((T13_RefSupplier_France_List
       LEFT JOIN ((gparticl_France
                   LEFT JOIN Localisation_France ON gparticl_France.ar_loco=Localisation_France.Localisation)
                  LEFT JOIN [*gpartic1_France] ON gparticl_France.ar_code=[*gpartic1_France].ar1_code) ON T13_RefSupplier_France_List.Art=gparticl_France.ar_code)
      LEFT JOIN ((gpfprodu_France
                  LEFT JOIN gpfourni_France ON gpfprodu_France.fp_four=gpfourni_France.fo_code)
                 LEFT JOIN Supplier_currency_France ON gpfourni_France.fo_monn=Supplier_currency_France.fo_monn) ON (T13_RefSupplier_France_List.Supplier_code=gpfprodu_France.fp_four)
      AND (T13_RefSupplier_France_List.Art=gpfprodu_France.fp_arti))
LEFT JOIN T13_RefSupplier_France_SupplierActive ON (T13_RefSupplier_France_List.Art=T13_RefSupplier_France_SupplierActive.Art)
AND (T13_RefSupplier_France_List.Supplier_code=T13_RefSupplier_France_SupplierActive.Supplier_code)
GROUP BY Trim([T13_RefSupplier_France_List].[Supplier_code]),
         Trim([T13_RefSupplier_France_List].[Art]),
         Trim([fp_rcli]),
         Nz([Localisation_France].[Site],"Poitiers"),
         Trim([ar_fami]),
         Trim([fp_upri]),
         Nz([fp_pcde],0),
         Supplier_currency_France.Recode,
         "",
         0,
         CDbl(Nz(Nz([fp_cond],[ar_qcdi]),0)),
         IIf([fp_minc]=0,Nz([fp_cond],[ar_qcdi]),[fp_minc]),
         0,
         0,
         0,
         gpfprodu_France.fp_dela,
         CDbl(Nz([Active],-1))
HAVING (((Trim(T13_RefSupplier_France_List.Supplier_code))<>"FG0002")
        AND ((Trim(T13_RefSupplier_France_List.Art))<>"A60101000000000")); 

Questions are two fold:

  • What is the behaviour of the LASTfunction?
  • Let’s say I want to transpose the LAST function to SQLServer: What should I do?

Thanks

Answer :

SUMMARY

  • With GROUP BY, you can only include the columns in the GROUP BY clause, constants, and aggregated values.
  • SQL Server has a similar window function, LAST_VALUE(), but it only exists in SQL Server 2012 or later.

DETAILED EXPLANATION

LAST() is what’s called an aggregate function. Aggregate functions calculate a value for all rows in a query (if there’s no GROUP BY, or for all rows with the same values for the columns in the GROUP BY clause. In SQL Server (and, I’ll assume, in Access), if you use an aggregate function in your SELECT list (the list of data columns to return from the query) in a query without a GROUP BY clause, then everything being returned must be:

  • Values that don’t rely on your table’s data:
    • constant values like Consigned or Min_order_value in your query
    • the results of functions that don’t operate on the table data, like NOW() in Access, or GETDATE() in SQL Server
  • The results of aggregate functions (usually operating on columns from the tables in the query), like the LAST() functions from your first query, or Max(IIf([ar1_pdanz]=0,NULL,[ar1_pdanz])) in both queries.

If you have a GROUP BY clause, then there’s one addition to the above: the SELECT list can also include the columns (or calculations) listed in the GROUP BY clause. Generally, they have to appear exactly the same in both places (for example, if you GROUP BY SUBSTRING(field1,1,20), you can’t just have field1 in the SELECT list).

When you put Last(Trim([ar_fami])) in your SELECT list, you’re basically telling Access: “There are going to be multiple values for ar_fami in my group, but I don’t really care what those values are; just give me the value from the last row you happen to see.”

If you include Trim([ar_fami]) in the SELECT list and in the GROUP BY, you’re telling Access that you want to see each unique value in your output, and the subtotals/averages/whatever for those values, along with the rest of the columns in the GROUP BY.

When you try to include Trim([ar_fami]) in the SELECT list, but not in the GROUP BY, then Access is confused: you may have multiple values for each possible combination of the GROUP BY columns, and it has no way to tell which one of those values you would want to see.

Note for us humans: in practice, it’s possible that for all the rows in each of your groups, in reality all the values are exactly the same. However, databases generally need to decide how they’re going to get data before they actually start to get it; and, at that point, it has no way to know if all of the ar_fami values in each group will be the same. So, it has to know how to handle different values in advance.

By the way – you should be able to take the constants out of the GROUP BY clause in your second query (the "" and the multiple 0s). Since they are always the same, they can’t change the grouping of the rows.


As far as your second question, as of SQL Server 2012, you’ve got access to a similar function there. The name is a little different: LAST_VALUE() instead of LAST(). Also, LAST_VALUE() is a “window function”; it requires an OVER() clause, defining the groups is works with and the order it will use, and the results are calculated for each row in the result set, not for an entire group.

You would have to create a convoluted query (or, more likely, set of nested queries) in order to capture non-grouped values from the last record seen in each group.

In practice, these values would seem to be of limited utility. If you’re aggregating total sales by the salespeople in four different locations, does it really matter that the name of the last salesperson was “Tony Warner”? Especially since it’s possible that you’d run the same query two days later, with the same underlying data, yielding the same group results for totals and subtotals – but noting the name of the last salesperson was “Frank Northrup”? In SQL Server at least, SQL will sort out the data into the various groups to get the aggregated values, and will apply whatever ORDER BY you’ve specified; but it’s up to you what that order by is. If the ORDER BY doesn’t include the column you’re getting the last value of, or something that’s tied to a specific value, the “last” row may be different each time you call the function, even over the same set of data.

Still, obviously, some people find it worth using. I may suspect those people don’t really grasp what they’re getting back, or (worse) what people looking at their results will assume, but maybe there’s some definite worth to command that I’m simply not seeing. For me, if the results are basically random, I don’t see the worth.

What is the behaviour of the LAST function?

From the documentation:

These functions return the value of a specified field in the first or last record, respectively, of the result set returned by a query. If the query does not include an ORDER BY clause, the values returned by these functions will be arbitrary because records are usually returned in no particular order.

This does not say anything about the behaviour of LAST when a GROUP BY clause is present, but from testing it appears that FIRST and LAST return values from the row encountered first or last within each group.

Without an ORDER BY clause the row chosen (per group) by FIRST and LAST is essentially arbitrary. The important point is that the values chosen by multiple FIRST and LAST functions will come from the same row.

This last point means you cannot just replace FIRST or LAST by MIN or MAX (aside from the different semantics) because the minimum and maximum will generally not be from the same source row.

Let’s say I want to transpose the LAST function to SQL Server: What should I do?

It is essentially impossible to duplicate this exactly, since the Access behaviour is not precisely defined; it’s not possible to predict which row will be chosen as FIRST or LAST in all but the simplest cases.

That said, if you can improve the query semantic to have a deterministic choice for FIRST or LAST row within each group, a general translation would be to number each row (ascending or descending per group), then choose values from the row numbered 1.

The row numbering can be done with ROW_NUMBER. Within the OVER clause, the GROUP BY columns go in the PARTITION BY section, with deterministic ordering provided in the ORDER BY section. You will need to write a subquery or use a Common Table Expression (CTE) to filter row number to 1.

For example:

DECLARE @Table1 table
(
    ID integer IDENTITY NOT NULL,
    GroupID integer NOT NULL,
    [Data] integer NOT NULL
);

INSERT @Table1
    (GroupID, [Data])
VALUES
    (1, 3),
    (1, 2),
    (1, 1),
    (2, 6),
    (2, 5),
    (2, 4);

-- FIRST (ordered by ID ASC within GroupID)
WITH Numbered AS
(
    SELECT
        T.GroupID,
        T.[Data],
        rn = ROW_NUMBER() OVER (
                PARTITION BY T.GroupID
                ORDER BY T.ID ASC)
    FROM @Table1 AS T
)
SELECT
    N.GroupID,
    N.[Data]
FROM Numbered AS N
WHERE
    N.rn = 1;

-- LAST (ordered by ID DESC within GroupID)
WITH Numbered AS
(
    SELECT
        T.GroupID,
        T.[Data],
        rn = ROW_NUMBER() OVER (
                PARTITION BY T.GroupID
                ORDER BY T.ID DESC)
    FROM @Table1 AS T
)
SELECT
    N.GroupID,
    N.[Data]
FROM Numbered AS N
WHERE
    N.rn = 1;

Demo: db<>fiddle

In SQL Server 2012 or later, this can also be done with the FIRST_VALUE and LAST_VALUE windowed functions, but the execution plan may be less efficient. Also, these windowed functions are not aggregates, so you need to write the expression so that it returns the same value for every row per group, then apply an arbitrary aggregate. For example (using non-deterministic ordering just for variety):

DECLARE @Table1 table
(
    ID integer IDENTITY NOT NULL,
    GroupID integer NOT NULL,
    [Data] integer NOT NULL
);

INSERT @Table1
    (GroupID, [Data])
VALUES
    (1, 3),
    (1, 2),
    (1, 1),
    (2, 6),
    (2, 5),
    (2, 4);

WITH Windowed AS
(
    SELECT
        T.GroupID,
        fv = FIRST_VALUE(T.[Data]) OVER (
                PARTITION BY T.GroupID
                ORDER BY T.GroupID
                ROWS BETWEEN UNBOUNDED PRECEDING
                AND UNBOUNDED FOLLOWING)
    FROM @Table1 AS T
)
SELECT
    W.GroupID,
    [Data] = MIN(W.fv)   -- arbitrary aggregate
FROM Windowed AS W
GROUP BY
    W.GroupID;

WITH Windowed AS
(
    SELECT
        T.GroupID,
        lv = LAST_VALUE(T.[Data]) OVER (
                PARTITION BY T.GroupID
                ORDER BY T.GroupID
                ROWS BETWEEN UNBOUNDED PRECEDING
                AND UNBOUNDED FOLLOWING)
    FROM @Table1 AS T
)
SELECT
    W.GroupID,
    [Data] = MAX(W.lv)   -- arbitrary aggregate
FROM Windowed AS W
GROUP BY
    W.GroupID;

Demo: db<>fiddle

Your third choice is to write a SQLCLR user-defined aggregate (UDA). It is not currently possible to guarantee deterministic ordering with these, but the implementation might more closely match what Access does. You would need to be careful that all UDA results were computed by the same operator, to ensure the results of multiple UDA calls all come from the same source row.

Leave a Reply

Your email address will not be published. Required fields are marked *