# How to do a kind of selective grouping in a SQL result?

Posted on

### Question :

I have some query as `select x1,x2,x3,x4 from foo where bar` resulting something like:

``````CREATE TEMP TABLE foo
AS SELECT * FROM ( VALUES
( '00','01','03','AA' ),
( '01','03','03','BB' ),
( '02','09','05','CC' ),
( '03','08','05','DD' ),
( '03','08','03','EE' ),
( '03','08','03','FF' ),
( '03','08','03','FF' ),
( '03','08','03','FF' ),
( '03','08','09','GG' )
) AS t( x1, x2, x3, x4);
``````

Note the `x3` column: we have `03` twice then `05` twice then `03` again then `09`. I need to calculate a route using the `pgr_ksp` function (PostGIS). Imagine the table here is the result and the `x3` column is the road names. In this case I start on `03` street, walk on it for 2 segments and take the `05` street just to take the `03` street again after.

My problem is I’m showing this segments to the user and I don’t like to. The user know nothing about segments of roads and just want to see `take 03 street, 05 street, 03 again and then 09`.

If I just group by, I will lost the `take 05` step between the two `03` and I’ll see the all long `03` segments connected: `take 03 street (all 6 segments together), 05 and then 09` and this is wrong.

My question: how can I group the `x3` column as the two first `03`, then group the two `05` and then group `03` again after `05` and then group all tuples left using this method.

Not so fast! I must have a sum of the `x2` column as I group the `x3`. This will be the geometry column I must compute – concat all segments geometry to have the complete way representation. This is the result I want:

``````-----------
| x2 | x3 |
|----|----|
| 04 | 03 |
|----|----|
| 17 | 05 |
|----|----|
| 32 | 03 |
|----|----|
| 08 | 09 |
-----------
``````

`x2` is the way geometry and `x3` is the name of the street or way. In this case I’ll have the complete way segment (sum of small pieces) and the way name.

``````create temp table tbl as
select * from
(values (1,01,03),(2,03,03),(3,09,05),(4,08,05),(5,08,03),(6,08,03),(7,08,03),(8,08,03),(9,08,09)) t (id, x2,x3);
``````

First you need to generate a partition by every group of X3 values:

This sentence uses LAG function to mark when X3 changes from the previous row.

``````select id, x2, x3, case when coalesce(lag(x3) over (order by id), x3) = x3 then 0 else 1 end c
from tbl
order by id
``````

Then you can establish a partition using:

``````select id, x2, x3, sum(cnt.c) over (order by id) as part
from cnt
order by id
``````

Then simply get the sum of X2 grouping by every partition.

``````with cnt as
(
select id, x2, x3, case when coalesce(lag(x3) over (order by id), x3) = x3 then 0 else 1 end c
from tbl
order by id
), cnt1 as
(
select id, x2, x3, sum(cnt.c) over (order by id) as part
from cnt
order by id
)
select sum(x2) xx2, min(x3) xx3
from cnt1
group by part
order by part
;
``````

The final result:

``````+-----+-----+
|  X2 |  X3 |
+-----+-----+
| 4   | 3   |
+-----+-----+
| 17  | 5   |
+-----+-----+
| 32  | 3   |
+-----+-----+
| 8   | 9   |
+-----+-----+
``````

Check it here: http://rextester.com/YPU13325

You will need some column value that differentiates the x3 values. For example you could group by both x1 and x3, then you will only lose rows where both x1 and x3 are the same.

``````select x1, sum(x2), x3 from foo where bar group by x1, x3.
``````

This would look like:

``````_____________________
|    |         |    |
| x1 | sum(x2) | x3 |
|____|_________|____|
|    |         |    |
| 00 | 01      | 03 |
| 01 | 03      | 03 |
| 02 | 09      | 05 |
| 03 | 08      | 05 |
| 03 | 24      | 03 |
| 03 | 08      | 09 |
|____|_________|____|
``````

For any other output you would need to use programmatic logic to loop through the SQL output, sum x2, and only present new info when x3 changes. You could do this in your application, or you might be able to write a function for it in PostGREs, but I’m pretty sure it is not supported by ANSI SQL.

This function was the one that actualy solved my problem. I’ll not accept it because I was not too clear in my question and based in my example the accepted answer is the correct one. The major problem is the `ST_Union` function: I need to store all geometries until the street name is the same. When the street name change I need to send it and a union of all segments geometry. To `ST_Union` I pass the geometry of the current record and the geometry of previously cached lines, and then replace the cache with the new union.

``````CREATE or replace FUNCTION route_agg(
IN source integer,
IN target integer,
IN k integer,
IN directed boolean)
RETURNS TABLE(geom geometry, way_name text, km double precision, seq integer) AS \$\$
DECLARE
var_r record;
tmp_name text;
tmp_geom geometry;
is_first boolean;
counter integer;
BEGIN
tmp_name := '';
is_first := true;
counter := 0;
FOR var_r IN ( select * from calc_rotas_v3(\$1, \$2, \$3, \$4) as r INNER JOIN osm_2po_4pgr as g ON r.edge = g.id order by seq) LOOP

if ( not is_first ) then
if ( tmp_name <> var_r.osm_name  ) then
geom := tmp_geom;
way_name = tmp_name;
km = ST_Length(tmp_geom::geography)/1000;

tmp_name := var_r.osm_name;
tmp_geom := var_r.geom_way;

seq = counter;
counter = counter + 1;
RETURN NEXT;
else
tmp_geom := ST_Union(tmp_geom, var_r.geom_way);
end if;
else
tmp_name := var_r.osm_name;
tmp_geom := var_r.geom_way;
is_first := false;
end if;

END LOOP;

geom := tmp_geom;
way_name = tmp_name;
km = ST_Length(tmp_geom::geography)/1000;
seq = counter;
RETURN NEXT;

END; \$\$
LANGUAGE 'plpgsql' VOLATILE;
``````