Question :
The scenario is as follows, a while ago a colleague accidentally ran a query similar to
DELETE FROM app_i18n WHERE disabled = 0 and translation is null or translation = '';
This is the table structure of app_i18n
CREATE TABLE app_i18n (
`id` INTEGER(11) NOT NULL AUTO_INCREMENT,
`category` INTEGER(11) unsigned NOT NULL,
`keyword` varchar(50) DEFAULT NULL,
`language` varchar(2) NOT NULL DEFAULT 'en',
`translation` text NOT NULL,
`disabled` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY (`category`, `keyword`, `language`, `disabled`)
) ENGINE=InnoDB;
The resulting problem is that some of those “translations” while empty were there for a reason. So a restore was in order. From an old copy of the db he restored all the rows that were empty or null and were not disabled.
The problem has been compounded by an unknown number of other developers quickly throwing in “translations” to fix the issues they were experiencing. So I thought the solution would be
SELECT id
FROM app_i18n
WHERE disabled = 0
AND (translation is null or translation = '')
GROUP BY category, keyword, language
HAVING COUNT(id) > 1;
Would that query yield the answers that I want? i.e. all records that having duplicate category, keywords in the same language which have duplicates?
Answer :
On most RDBMSs your query causes an error. I believe MySQL returns a result, but not necessarily what you are looking for.
When you group by a set of columns, only columns that are mentioned in the group by clause can also be mentioned in the directly in the select clause. All other columns have to be wrapped in some kind of aggregate like MAX() or AVG().
You are looking for all category, keyword and language combinations that have duplicate entries. So you want your query to return those columns:
SELECT category, keyword, language
FROM app_i18n
WHERE disabled = 0
AND (translation is null or translation = '')
GROUP BY category, keyword, language
HAVING COUNT(id) > 1;
If you want all columns of the rows that have a non-unique category, keyword and language combination, you need to add a join:
SELECT A.*
FROM app_i18n A
JOIN (
SELECT category, keyword, language
FROM app_i18n A2
WHERE disabled = 0
AND (translation is null or translation = '')
GROUP BY category, keyword, language
HAVING COUNT(id) > 1)C
ON A.category = C.category
AND A.keyword = C.keyword
AND A.language = C.language;
MySQL has an GROUP_CONCAT function so you can get an comma separated list of id’s that matches your duplicated records.
SELECT
GROUP_CONCAT(id ORDER BY id ASC) as "duplicated_row_ids"
, category
, keyword
, language
FROM app_i18n
WHERE disabled = 0
AND (translation is null or translation = '')
GROUP BY category, keyword, language
HAVING COUNT(id) > 1;
or
SELECT
CONCAT('{"id":'
, GROUP_CONCAT(CONCAT('"', id, '"') ORDER BY id ASC SEPARATOR ',"id":')
, '}'
) as "duplicated_row_ids"
, category
, keyword
, language
FROM app_i18n
WHERE disabled = 0
AND (translation is null or translation = '')
GROUP BY category, keyword, language
HAVING COUNT(id) > 1;
for nice JSON id’s
see http://sqlfiddle.com/#!2/3b51a/6 for demo