Query optimization

Posted on

Question :

In another question they told me that the speed of a DBMS largely depends on the optimizations of your queries. Does anybody know good documents about this? At school I only learned the basics.

For example, a query to find a user with a password.

I would take this from a table that holds the personal information or just only username/password.

My query would look something like:

SELECT username, password
FROM tblUsers
WHERE username = foo

Is there any optimization to this? Or is it only in very large operations, like having a lot of JOINS with a large number of records?

EDIT: The above example is just that, an example. I’m looking for documents on Query optimization 😉

Answer :

Of the many ways to optimise queries in a relational database, indexes are the best place to start. When you index a column, the database stores the records for that column in sorted order to improve searching speeds. The price you pay for this is additional storage overhead and decreased speed for INSERT/UPDATE/DELETE statements.

When you search an unordered list for a value, worst case scenario you have to look at every record. If you have a list of 1,000,000 items, you have to scan all 1,000,000. If, on the other hand, the list is ordered, you can use a binary search instead, which can locate a matching record in no more than 20 lookups.

The easiest way to think of this is a phone directory, which is indexed on surname. So if you’re looking for “Wilson”, you can skip over most of the pages and go straight to “W”. If, on the other hand, you are looking for a number to find out the name of the person that owns that number, you’re forced to start at the beginning and work your way through one at a time, which would be incredibly painful.

Imagine now, that you have a second phone directory that is sorted on number. You now have twice as many books and this takes up more space, but if you’re commonly looking up names based on a phone number then the trade-off is well worth it.

Database indexes are very much like this. As a general rule:

  • Index columns that you frequently include in WHERE/GROUP BY clauses.

    select * from t1 where poo = 'smelly'

    select category, count(*) from t2 group by category

  • Index foreign keys (columns that refer to keys in other tables). For example:

    select * from t1 join t2 on t1.lol = t2.rofl

    If the data in both tables is sufficiently large, an index on the foreign key t1.lol could result in noticable gains. Foreign keys should not be indexed blindly: like any column, inappropriate indexes offer no benefits and can sometimes impede performance and result in issues like deadlocking.

  • Indexes can consist of multiple columns. In fact, if all the columns in the SELECT list can be found in an index, the query can be run against the index rather than the table. Multi-column indexes can be exploited by compound WHERE conditions using AND (e.g., a = 'lol AND b = 'rofl'), but not OR. If the previous predicate were rewritten as a = 'lol OR b = 'rofl', you would be better served by two separate indexes on columns a and b.

This just scratches the surface, but will give you substantial gains for effort.

My only suggestion would be to make sure that you have a good index on the username column, so the lookup can be as fast as possible. Typically optimizations need to happen on frequently used and/or complex queries. For some databases (like Oracle), recomputing statistics on a table can often help as well (since as a table grows, these stats may grow stale, leading to the wrong optimization choice inside the SQL engine).

Your query could be optimized in this way

@var = foo

SELECT @var AS username, password
FROM tblUsers
WHERE username = @var

There is no reason to ask the database for information you are already providing.

Optimizing queries is as much art as science. Try and do the least amount of work possible in a query. Break a query into multiple parts and use temporary tables or table variables to store your work instead of derived tables or sub-queries. If something seems too complicated then it probably is (this is the “art” piece of optimization.)

Now here is the “science”.

Use the SQL Management Studio to output query performance information.

Click "Query" / "Include Client Statistics"

After opening a new query window execute this statement once:


And execute this statement before every query to ensure that you are always starting from a baseline:


After executing the query view the “Messages” output window and save the information. After modifying the query and executing again you can compare the I/O statistics.

Finally, learn how to use the SQL Profiler tool to quickly capture and compare lots of statistics like total read, write, cpu & duration (ms) for every query.

Leave a Reply

Your email address will not be published. Required fields are marked *