Question :
My company is going to develop a mobile app selling services across different regions. We are expecting users are going to hugely depend on the Search function. Thus, I need to make sure the database design is good enough to handle the searches.
The main concern I have is, not only should the user be able to search the services by name or description, but also by service location.
Let me first show you what I designed:
Solution 1
Table: Service
+------+----------+--------------+--------+----------+----------+
| id | name | description | cityid | address1 | address2 |
+------+----------+--------------+--------+----------+----------+
| 1| Service1 | Description1 | 1| Address1 | |
| 2| Service2 | Description2 | 2| Address2 | |
| 3| Service3 | Description3 | 3| Address3 | |
+------+----------+--------------+--------+----------+----------+
Table: City
+------+-----------+---------+
| id | name | stateid |
+------+-----------+---------+
| 1| KL | 1 |
| 2| Georgetown| 2 |
| 3| JB | 3 |
+------+-----------+---------+
Table: State
+------+---------------------+-----------+
| id | name | countryid |
+------+---------------------+-----------+
| 1| Wilayah Persekutuan | 1 |
| 2| Penang | 2 |
| 3| Johor | 3 |
+------+---------------------+-----------+
Table: Country
+------+-----------+
| id | name |
+------+-----------+
| 1| Malaysia |
| 2| Singapore |
+------+-----------+
Above are my tables with some sample data. The requirement is to allow the user to enter a keyword to start searching in the Service
table – if the keyword matches a service name, then return that result from Service
. If not, then search for the keyword in the City
table – if the keyword matches a particular City
row, then get all the Service
rows that match that city by the cityid
. If no row is found, then go on searching in the State
table, up until the Country
table.
So here is a problem with my solution. If the user enters a Country name, I will get a match result from the Country
table, then I use the result’s countryid
to get all the related states in the State
table, then I use those results’ stateid
s to get all related cities in the City
table – and only then I use the last results’ cityid
to search for the related services in the Service
table. Which I think is not a good approach as it needs to use a lot of recursive IN
searches to filter the results, which is not good for performance.
My colleague then came up with another solution, which put all the search related references into the Service
table – something like this:
Solution 2
Table: Service
+------+----------+--------------+--------+----------+-----------+----------+----------+
| id | name | description | cityid | stateid | countryid | address1 | address2 |
+------+----------+--------------+--------+----------+-----------+----------+----------+
| 1| Service1 | Description1 | 1 | 1 | 1 | Address1 | |
| 2| Service2 | Description2 | 2 | 2 | 2 | Address2 | |
| 3| Service3 | Description3 | 3 | 3 | 3 | Address3 | |
+------+----------+--------------+--------+----------+-----------+----------+----------+
So, if Country
matched the search, then use the result’s countryid
to perform search straight in the Service
table, otherwise, if State
matched the search, then use the result’s stateid
to perform the search in Service
and so on. This method is more efficient as it doesn’t have so much recursive searches, but the downside is it violates the Normalisation practices: the Service
table has redundant info, which is countryid
and stateid
. And logically speaking, I can find those two IDs with only cityid
, even though I need to perform multiple look-ups/joins.
So which solution should I go for? Or do you have a better suggestion? Please advise.
Answer :
Instead of gauging geographical proximity by presence in a given city, state or country, why not locate services using either a geographical point (latitude/longitude) or by geographic polygons?
The scheme you are proposing does not account for these situations:
- Some countries are very large, so being within the country is not a great measure of being proximate.
- Not all locations are within a city, so locating services by city name will not always be possible.
- Not every country has political subdivisions analogous to states, so your hierarchy does not always hold.
MySQL has geospatial extensions. You should examine these and see if they solve your problem more directly by allowing you to find services that are “near” (whatever that means in your context) the searcher.
I would be tempted to deploy all of the techniques you’ve mentioned. Let me explain. The normalised Service/ City/ State/ Country is great for OLTP processing. So retain it for that part of the application and treat it as the actual data store.
As you mentioned, this is likely to make search complicated so denormalising would be advantageous. Unlike your proposal I would denormalise into a separate table which exists only to assist search. It is an explicit acknowledgement that a real DBMS on an actual computer does not have the performance characteristics the theoretical computer scientists would like to imagine. By putting it in a separate table one isolates the compromised physical necessity from the preferred, normalised design.
Finally I would combine all the search terms in a single column in this denormalised table. One way would be to together the Service Name, City Name, State Name and Country Name, space delimited, and place a full text index on that column.
Another would be to have three columns – id
, name
and source_table
. The first two come from the data values listed in the question. source_table
says which of the normalised tables this comes from. Sample data, taken from the question, would be:
id name source_table
----- -------- ------------
1 Service1 Service
2 Service2 Service
...
1 KL City
3 JB City
...
2 Penang State
..
1 Malaysia Country
The query would match the user’s search term against this table’s name
column:
select
id,
source_table
from search_table
where name = <user-provided value>
The returned id
and source_table
can be used to determine which values to choose from the normalised tables. These can be joined to each other in the normal fashion. Since there are four layers in the hierarchy there can be only four possible queries (service, service/ city, service/ city/ state, service/ city/ state/ country). It would be simple to call the appropriate one like this:
if source_table = "Service"
select
<columns>
from Service
where id = <search_table.id>
if source_table = "City"
select
<columns>
from Service
inner join City
on Service.cityid = City.id
where City.id = <search_table.id>
... etc.
Be aware that names may be duplicated across these levels (Monaco, for example) and decide what you wish to do with these cases.
The process could be simplified somewhat if names could be guaranteed to be unique and natural keys were used rather than surrogate ones.
I wonder, though, if all this effort is worthwhile. How many rows will there be in these tables? How poor is performance of a query on the normalised tables, with indexes, that compares the user’s search value to each name column with ORs between? By creating a representative amount of dummy data and measuring response time on production-like hardware you could save some development, and a lot of maintenance, work.
City+State+Country should be in a single Location table. Or… If there is little redundancy, put those into the table that will link to it.
That is, there is such a thing as “over-normalizing”. Over-normalizing can make searching more complex and queries slower; meanwhile the benefits of normalizing this type of data is minimal.
What is in description
? A bunch of words that a user might search for? If so, FULLTEXT
is an excellent choice. However, be aware of the limitations it has.
If you do employ CHAR(2)
for country
, then make it CHAR(2) CHARACTER SET ascii
. If the default charset is utf8mb4, CHAR(2)
will take 8 bytes. Specifying ascii will keep it down to the 2 bytes, which is all that is needed.
If you need to search for the “nearest”, that is a different kettle of fish. You will find lots of discussions here: https://stackoverflow.com/questions/tagged/latitude-longitude .