Split values from field and count them according to another field

Posted on

Question :

I have following MySQL db structure:

content
=======
id | tags
---------
1  | ,abc,def,ghi,
2  | ,abc,foo,
3  | NULL
4  | ,foo,bar,


categories
==========
id
--
1
2


categories_to_content
=====================
id | category_id | content_id
-----------------------------
 1 |           1 |          1
 2 |           1 |          2
 3 |           2 |          3
 4 |           2 |          4

What I’m trying to do is to SELECT tags assigned to category and count them. Result for given schema:

tag | category | count
----------------------
abc |        1 |     2
def |        1 |     1
ghi |        1 |     1
foo |        1 |     1
foo |        2 |     1
bar |        2 |     1

Decomposition isn’t possible (I hook up into existing CMS database).

My existing query:

SELECT tags, category_id as category FROM content
JOIN categories_to_content ON content_id = content.id

Answer :

You can achieve your goal by using a tally (number) table, which you can easily create in the following way

CREATE TABLE tally(n INT NOT NULL AUTO_INCREMENT PRIMARY KEY);

INSERT INTO tally (n)
SELECT NULL
 FROM 
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
;

This will create a tally table and fill it with 100 rows with values
1 to 100 accordingly.

You should adjust the number of rows in your table to be equal or more (e.g. double the number to be on the safe side) than a maximum number of delimiters (comma) in content table.
You can figure out the maximum number with a simple query

SELECT MAX(CHAR_LENGTH(tags) - CHAR_LENGTH(REPLACE(tags, ',', ''))) max_tags
  FROM content;

Output:

| MAX_TAGS |
|----------|
|        4 |

Now you can use it to split tags

SELECT category_id, tag, COUNT(*) count
  FROM
(
  SELECT cc.category_id, 
         SUBSTRING_INDEX(SUBSTRING_INDEX(c.tags, ',', n.n), ',', -1) tag
    FROM categories_to_content cc JOIN content c
      ON cc.content_id = c.id CROSS JOIN tally n
   WHERE n.n <= 1 + (CHAR_LENGTH(c.tags) - CHAR_LENGTH(REPLACE(c.tags, ',', '')))
) q
 WHERE CHAR_LENGTH(tag) > 0
 GROUP BY category_id, tag

Output:

| CATEGORY_ID | TAG | COUNT |
|-------------|-----|-------|
|           1 | abc |     2 |
|           1 | def |     1 |
|           1 | foo |     1 |
|           1 | ghi |     1 |
|           2 | bar |     1 |
|           2 | foo |     1 |

Here is SQLFiddle demo

Leave a Reply

Your email address will not be published. Required fields are marked *