Question :
I have a JSON with the following structure:
uid | item
1 |[{"id":1, "m":"123", "s":1},{"id":2, "msg":"on","s":2},{"id":3, "msg":"of","s":1}]
2 |[{"id":1, "m":"yes", "s":2},{"id":3, "msg":"gh","s":0}]
3 |[{"id":1, "m":"qa", "s":1},{"id":4, "msg":"ks"},{"id":5, "m":"test"}]
I want to query this table based on id
and get rows based on matched object values. For e.g. for id=3
I want:
uid | id | m | s
1 | 3 | of | 1
2 | 3 | gh | 0
This table has over 500M rows so I would need an index. Using GIN(item jsonb_path_ops)
with item @> '[{"id": 3}]'
works, but I don’t know how to get the exact json object from array that matched.
I can turn the JSON structure to:
{
"1": { "m":"123", "s": 1 },
"2": { "m":"on", "s":2 },
"3": { "m":"of", "s": 1 }
}
and use GIN(item)
and check for key exists with ?
and then try to fetch the key value. But the index size might increase which might be an overkill given that I only want to search based on id.
Maybe I can use a B-Tree index but not sure how. What should be my JSON structure for most efficient querying? and what index should be used so that I get only the matched object from the array?
Some facts:
- Cardinality of
id
<<<uid
. (id = ~100,000, uid = ~500,000,000) - Normalizing this turns this table to 500M*10,000 rows which is extremely slow.
- Each JSON array will have upto 50 objects only.
Answer :
There are 2 questions here;
First one is the index structure.
- You can use gin(item) here (without jsonb_path_ops). It will be just fine. You are querying for key (id) only not the full path.
- You can create an expression index (or an gin index on a generated column) for extracting id field values to an integer[] value. Most probably this index will be smaller than jsonb index (~%10 smaller). But I am not sure whether it’s faster for your case.
Here is an example for second usage.
Create an immutable function.
CREATE FUNCTION to_my_int_array (jsonb) RETURNS integer[] AS
$$
SELECT
ARRAY(
SELECT (jsonb_array_elements($1) ->> 'id')::integer
);
$$
LANGUAGE sql
IMMUTABLE
RETURNS NULL ON NULL INPUT;
Create an expression index with the function above:
CREATE INDEX ON my_table USING gin (to_my_int_array (item));
Filter the query with function;
SELECT
mt.uid,
my_item.*
FROM my_table mt,
LATERAL jsonb_array_elements(mt.item) ae,
LATERAL jsonb_to_record(ae) AS my_item (id int, m text, s int)
WHERE
to_my_int_array(mt.item) @> '{1}'::integer[];
Second question is the output. The query above also shows output for version 11. After version 12, you can simply use the query below;
SELECT
uid,
my_item.*
FROM my_table mt,
LATERAL jsonb_to_record(jsonb_path_query_first(mt.item, '$[*] ? (@.id == 1) '))
AS my_item(id int, m text, s int)
WHERE
mt.item @> '[{"id": 1}]';
| uid | id | m | s |
| --- | --- | --- | --- |
| 1 | 1 | 123 | 1 |
| 2 | 1 | yes | 2 |
| 3 | 1 | qa | 1 |