Question :
Sorry if the title is not clear – feel free to improve.
Here are 3 tables (not all data is relevant to the question):
CREATE TABLE `content` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`community_id` int(10) unsigned NOT NULL COMMENT 'The community for whom this URL has special rules',
`url_id` int(10) unsigned DEFAULT NULL COMMENT 'The URL that has special rules',
`question_id` int(10) unsigned DEFAULT NULL,
`banned` tinyint(1) unsigned NOT NULL DEFAULT '0' COMMENT 'Flags if a URL is banned from the community',
`featured` tinyint(3) unsigned NOT NULL DEFAULT '0' COMMENT 'If this article is to be featured on this site',
`slider_image` tinytext,
`weight` int(11) DEFAULT '0',
`create_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `content_url_unique` (`community_id`,`url_id`),
UNIQUE KEY `content_question_unique` (`community_id`,`question_id`),
KEY `community_index` (`community_id`),
KEY `content_url_idx` (`url_id`),
KEY `content_community_idx` (`community_id`),
KEY `content_question_idx` (`question_id`),
CONSTRAINT `content_community` FOREIGN KEY (`community_id`) REFERENCES `community` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `content_question` FOREIGN KEY (`question_id`) REFERENCES `question` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
CONSTRAINT `content_url` FOREIGN KEY (`url_id`) REFERENCES `url` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=186 DEFAULT CHARSET=utf8;
CREATE TABLE `url_tag` (
`url_id` int(10) unsigned NOT NULL COMMENT 'The URL to be tagged',
`tag_id` int(10) unsigned NOT NULL COMMENT 'The tag the url refers to',
PRIMARY KEY (`url_id`,`tag_id`),
KEY `FK_urltag_url_idx` (`url_id`),
KEY `FK_urltag_tag_idx` (`tag_id`),
CONSTRAINT `FK_urltag_tag` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
CONSTRAINT `FK_urltag_url` FOREIGN KEY (`url_id`) REFERENCES `url` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Allows to tag URLs';
CREATE TABLE `question_tag` (
`question_id` int(10) unsigned NOT NULL,
`tag_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`question_id`,`tag_id`),
UNIQUE KEY `question_tag` (`question_id`,`tag_id`),
KEY `fk_questiontag_question_idx` (`question_id`),
KEY `fk_questiontag_tag_idx` (`tag_id`),
CONSTRAINT `fk_questiontag_question` FOREIGN KEY (`question_id`) REFERENCES `question` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
CONSTRAINT `fk_questiontag_tag` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='The tags associated with a given question';
Note that Content is meant to have either a url_id
, or a question_id
– not both.
I need a way to connect a Content ID to a Tag ID. I was thinking of creating a VIEW, and here are 2 possibilities I came up with so far.
Option 1:
(SELECT content.id AS content_id, url_tag.tag_id AS tag_id
FROM url_tag, content
WHERE url_tag.url_id = content.url_id)
UNION
(SELECT content.id AS content_id, question_tag.tag_id AS tag_id
FROM question_tag, content
WHERE question_tag.question_id = content.question_id)
Option 2:
SELECT content.id AS content_id, COALESCE(url_tag.tag_id, question_tag.tag_id) AS tag_id
FROM `content`
LEFT JOIN url_tag ON content.url_id = url_tag.url_id
LEFT JOIN question_tag ON content.question_id = question_tag.question_id WHERE COALESCE(url_tag.tag_id, question_tag.tag_id) IS NOT NULL
My question is which one is the better approach (or if there is one better)?
Note also that currently some Contents are not tagged at all.
Bonus challenge: a VIEW that connects content_id straight to category_id
CREATE TABLE `category_tag` (
`category_id` int(10) unsigned NOT NULL,
`tag_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`category_id`,`tag_id`),
KEY `FK_categorytag_category_idx` (`category_id`),
KEY `FK_categorytag_tag_idx` (`tag_id`),
CONSTRAINT `FK_categorytag_category` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
CONSTRAINT `FK_categorytag_tag` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Answer :
There’s not really a strong case here for doing this with views.
Of options 1 and 2, option 2 is the only option that is likely to be usable on data of any size, because of the different algorithms MySQL uses for processing views, and the strengths vs. limitations of each.
Unless explicitly declared otherwise, MySQL uses the MERGE
algorithm whenever possible, which essentially makes the view into an alias for a more complicated query. This provides some optimizations especially with the use of indexes in the underlying tables when interpreting the WHERE
clause in the outer query.
SELECT * FROM view_option_2 WHERE content_id = 500;
I don’t see a reason why MERGE
could not be used with this view definition, so the server should process the query above as though you had sent this:
SELECT content.id AS content_id,
COALESCE(url_tag.tag_id, question_tag.tag_id) AS tag_id
FROM `content`
LEFT JOIN url_tag ON content.url_id = url_tag.url_id
LEFT JOIN question_tag ON content.question_id = question_tag.question_id
WHERE content.id = 500
AND COALESCE(url_tag.tag_id, question_tag.tag_id) IS NOT NULL
This could reasonably be expected to use the primary key on content
and the leftmost column of the primary key of utl_tag
and question_tag
. It seems like a nice, tidy, well-indexed query.
On the other hand, if Option 1 were used for a view definition, the server would not be able to use the MERGE
algorithm, because of the UNION
… so the server would use the other algorithm, TEMPTABLE
, and construct a temporary table and fill it with the union of the two queries and then return only the rows specified by your where clause by eliminating non-matching rows from the temporary table. But the point you don’t want to overlook here is that the temporary table could contain every possible row without regard to the WHERE
in your outer query.
You don’t have to specify an ALGORITHM
when you define a view, and if you don’t, then it defaults to UNDEFINED
which allows the server to choose. One advantage of specifying an algorithm is that you always know which one will be used. Extra bonus, you get a warning if you ask for MERGE
and it can’t be done. Merge also doesn’t work if there are no tables in the view, so here’s a simple example:
mysql> CREATE ALGORITHM=MERGE VIEW test_view AS SELECT UUID_SHORT() AS a_big_number;
Query OK, 0 rows affected, 1 warning (0.05 sec)
We didn’t get an error but we did get a warning.
mysql> show warnings;
+---------+------+-------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------------------------------------+
| Warning | 1354 | View merge algorithm can't be used here for now (assumed undefined algorithm) |
+---------+------+-------------------------------------------------------------------------------+
1 row in set (0.00 sec)
This is, however, a perfectly legitimate and working view, that returns a different number each time:
mysql> select * from test_view;
+---------------------+
| a_big_number |
+---------------------+
| 2184556849028661248 |
+---------------------+
1 row in set (0.00 sec)
Define your view with ALGORITHM=MERGE
and if the server doesn’t like it, then that’s not a good candidate for a view in this application.
Note, though, that this doesn’t mean the TEMPTABLE
algorithm isn’t without its benefits also… but if you are going to SELECT
from your view WHERE
some condition applies to the fields in the underlying tables, TEMPTABLE
prevents those conditions from “pushing through” the view definition. Instead, they’re applied to the result after the temporary table is derived, and of course that temporary table has no indexes.
Your “bonus” does not seem unusually complicated, either.
Take the Option 2 query and modify it:
SELECT content.id AS content_id,
COALESCE(url_tag.tag_id, question_tag.tag_id) AS tag_id,
category_tag.category_id
FROM `content`
LEFT JOIN url_tag ON content.url_id = url_tag.url_id
LEFT JOIN question_tag ON content.question_id = question_tag.question_id
LEFT JOIN category_tag on category_tag.tag_id = COALESCE(url_tag.tag_id, question_tag.tag_id)
WHERE COALESCE(url_tag.tag_id, question_tag.tag_id) IS NOT NULL
If you don’t want to eliminate untagged content, remove the WHERE
clause.
To select for a specific content.id, put that in the WHERE
clause.
If you get duplicate categories because of the relationships between category and tag then you’ll need SELECT DISTINCT
but don’t put that inside the view, because that isn’t compatible with MERGE
either.