How to best handle MAX within a query/view

Posted on

Question :

I have two tables named “Visit” and “VisitMovement” and one Visit can contain many VisitMovements.

Within my query I need to join to the latest record for the Visit in VisitMovement so I can get back the latest VisitMovement details.

To do this I have a view like this that I join to:

SELECT
    VisitID,
    max(VisitMovementID) as VisitMovementID
FROM
    dbo.VisitMovement
GROUP BY
    VisitID

and finish up with:

    SELECT 
        ......
        FROM 
            Visit V
        INNER JOIN VisitMovement VM ON
            V.VisitID = VM.VisitID
        INNER JOIN vwLatestVisitMovement LVM on
            VM.VisitMovementID = LVM.VisitMovementID

I can’t Index this View due to the max operator, and I can see over-time this becoming a performance issue with the Index Scan operation.

I wondered if anyone had any good ideas on how to approach this? I was thinking of having a trigger and maintaining the latest VisitMovementID in the Visit table but this doesn’t sit well with me.

Answer :

I think (not very clear) that you want something either:

For a VIEW solution – use something like this

SELECT v.visitor_id AS vid, v.visitor_name
FROM visitor v
INNER JOIN vw_max_visit_movement vmw ON
v.visitor_id = vmw.visitor_id;

Personally, I might want to use a CTE (unless I had millions of visitors), as per here. The reason I mentioned that you might want to avoid this approach for large tables is best explained here.

WITH visitor_max AS
(
  SELECT visitor_id, MAX(visit_movement_id)
  FROM visit_movement
  GROUP BY visitor_id
)
SELECT v.visitor_id AS "Vistor No.", v.visitor_name AS "Visitor name"
FROM visitor v
INNER JOIN visitor_max vmax ON
v.visitor_id = vmax.visitor_id;

In any case, the results are the same (not sure if this is what you’re asking – if not, please expand the question)

Vistor No.  Visitor name
         2        Paulie
         3           Jim
         4          Bill
         5          Fred

=== DDL and DML for the tables used =====

CREATE TABLE visit_movement
(
  visit_movement_id INTEGER,   -- possibly a timestamp?
  visitor_id INTEGER
);


INSERT INTO visit_movement 
VALUES
(34, 2),
(35, 2),
(34, 2),
(37, 3),
(38, 3),
(39, 3),
(40, 3),
(41, 4),
(42, 4),
(43, 5);


CREATE TABLE visitor
(
  visitor_id INTEGER,
  visitor_name VARCHAR (20)
);

INSERT INTO visitor
VALUES
(1, 'Mary'),
(2, 'Paulie'),
(3, 'Jim'),
(4, 'Bill'),
(5, 'Fred'),
(6, 'Xavier');

CREATE VIEW vw_max_visit_movement AS
SELECT visitor_id, MAX(visit_movement_id)
FROM visit_movement
GROUP BY visitor_id;

Two things, first, what does the execution plan look like on this? Are you seeing efficient use of the existing indexes and structure? If so, then that’s likely to continue into the future. If it’s problematic now, it’s just going to get worse.

So, you might want to look at mechanisms for querying the data like those I outline in this article. By and large, at least through my tests, using ROW_NUMBER to get that latest value is likely to work better for you.

Also, you may not want to use a view. I know that it makes code re-use possible, but sometimes in T-SQL we just have to write the same query 100 (or more) times because the engine deals with it better. Test with & without a view to be sure of that though.

Leave a Reply

Your email address will not be published. Required fields are marked *