Best way to design / query multi-level relationship table

Posted on

Question :

My company is going to develop a mobile app selling services across different regions. We are expecting users are going to hugely depend on the Search function. Thus, I need to make sure the database design is good enough to handle the searches.

The main concern I have is, not only should the user be able to search the services by name or description, but also by service location.

Let me first show you what I designed:

Solution 1
Table: Service
| id   | name     | description  | cityid | address1 | address2 |
|     1| Service1 | Description1 |       1| Address1 |          |
|     2| Service2 | Description2 |       2| Address2 |          |
|     3| Service3 | Description3 |       3| Address3 |          |

Table: City
| id   | name      | stateid |
|     1| KL        |       1 |
|     2| Georgetown|       2 |
|     3| JB        |       3 |

Table: State
| id   | name                | countryid |
|     1| Wilayah Persekutuan |         1 |
|     2| Penang              |         2 |
|     3| Johor               |         3 |

Table: Country
| id   | name      |
|     1| Malaysia  |
|     2| Singapore |

Above are my tables with some sample data. The requirement is to allow the user to enter a keyword to start searching in the Service table – if the keyword matches a service name, then return that result from Service. If not, then search for the keyword in the City table – if the keyword matches a particular City row, then get all the Service rows that match that city by the cityid. If no row is found, then go on searching in the State table, up until the Country table.

So here is a problem with my solution. If the user enters a Country name, I will get a match result from the Country table, then I use the result’s countryid to get all the related states in the State table, then I use those results’ stateids to get all related cities in the City table – and only then I use the last results’ cityid to search for the related services in the Service table. Which I think is not a good approach as it needs to use a lot of recursive IN searches to filter the results, which is not good for performance.

My colleague then came up with another solution, which put all the search related references into the Service table – something like this:

Solution 2
Table: Service
| id   | name     | description  | cityid | stateid  | countryid | address1 | address2 |
|     1| Service1 | Description1 |      1 |        1 |         1 | Address1 |          |
|     2| Service2 | Description2 |      2 |        2 |         2 | Address2 |          |
|     3| Service3 | Description3 |      3 |        3 |         3 | Address3 |          |

So, if Country matched the search, then use the result’s countryid to perform search straight in the Service table, otherwise, if State matched the search, then use the result’s stateid to perform the search in Service and so on. This method is more efficient as it doesn’t have so much recursive searches, but the downside is it violates the Normalisation practices: the Service table has redundant info, which is countryid and stateid. And logically speaking, I can find those two IDs with only cityid, even though I need to perform multiple look-ups/joins.

So which solution should I go for? Or do you have a better suggestion? Please advise.

Answer :

Instead of gauging geographical proximity by presence in a given city, state or country, why not locate services using either a geographical point (latitude/longitude) or by geographic polygons?

The scheme you are proposing does not account for these situations:

  • Some countries are very large, so being within the country is not a great measure of being proximate.
  • Not all locations are within a city, so locating services by city name will not always be possible.
  • Not every country has political subdivisions analogous to states, so your hierarchy does not always hold.

MySQL has geospatial extensions. You should examine these and see if they solve your problem more directly by allowing you to find services that are “near” (whatever that means in your context) the searcher.

I would be tempted to deploy all of the techniques you’ve mentioned. Let me explain. The normalised Service/ City/ State/ Country is great for OLTP processing. So retain it for that part of the application and treat it as the actual data store.

As you mentioned, this is likely to make search complicated so denormalising would be advantageous. Unlike your proposal I would denormalise into a separate table which exists only to assist search. It is an explicit acknowledgement that a real DBMS on an actual computer does not have the performance characteristics the theoretical computer scientists would like to imagine. By putting it in a separate table one isolates the compromised physical necessity from the preferred, normalised design.

Finally I would combine all the search terms in a single column in this denormalised table. One way would be to together the Service Name, City Name, State Name and Country Name, space delimited, and place a full text index on that column.

Another would be to have three columns – id, name and source_table. The first two come from the data values listed in the question. source_table says which of the normalised tables this comes from. Sample data, taken from the question, would be:

  id     name      source_table
  -----  --------  ------------
  1      Service1  Service
  2      Service2  Service
  1      KL        City  
  3      JB        City
  2      Penang    State
  1      Malaysia  Country

The query would match the user’s search term against this table’s name column:

from search_table
where name = <user-provided value>

The returned id and source_table can be used to determine which values to choose from the normalised tables. These can be joined to each other in the normal fashion. Since there are four layers in the hierarchy there can be only four possible queries (service, service/ city, service/ city/ state, service/ city/ state/ country). It would be simple to call the appropriate one like this:

if source_table = "Service"

  from Service
  where id = <>

if source_table = "City"

  from Service
  inner join City
    on Service.cityid =
  where = <>

... etc.

Be aware that names may be duplicated across these levels (Monaco, for example) and decide what you wish to do with these cases.

The process could be simplified somewhat if names could be guaranteed to be unique and natural keys were used rather than surrogate ones.

I wonder, though, if all this effort is worthwhile. How many rows will there be in these tables? How poor is performance of a query on the normalised tables, with indexes, that compares the user’s search value to each name column with ORs between? By creating a representative amount of dummy data and measuring response time on production-like hardware you could save some development, and a lot of maintenance, work.

City+State+Country should be in a single Location table. Or… If there is little redundancy, put those into the table that will link to it.

That is, there is such a thing as “over-normalizing”. Over-normalizing can make searching more complex and queries slower; meanwhile the benefits of normalizing this type of data is minimal.

What is in description? A bunch of words that a user might search for? If so, FULLTEXT is an excellent choice. However, be aware of the limitations it has.

If you do employ CHAR(2) for country, then make it CHAR(2) CHARACTER SET ascii. If the default charset is utf8mb4, CHAR(2) will take 8 bytes. Specifying ascii will keep it down to the 2 bytes, which is all that is needed.

If you need to search for the “nearest”, that is a different kettle of fish. You will find lots of discussions here: .

Leave a Reply

Your email address will not be published. Required fields are marked *