PostgreSQL and query planner

Posted on

Question :

I have question about query planner in PostgreSQL. I know, that this planner is taking data from pg_statistics, but… Can anyone tell me on what basis this planner is taking those informations to make efficient queries? How is it using those statistics? Is there any better description of this whole process, than in the documentation?

I’m fully aware that my question might not be understandable, but I really don’t know how to explain my problem easier. If you have questions, please ask.

Answer :

The most detailed discussion of how statistics are used is the Row Estimation Examples section of the documentation. Ultimately all information about how a query might be executed is turned into a serious of costs via the various Cost Constants. So if a table is 1000 pages in size, and the statistics suggest 10% of it will be touched at random by a proposed query, that’s 100 pages * 4.0 (random_page_cost) = 400 cost units for pulling in the data; then other constants are used to determine things like processing costs on the data in those pages.

The query optimizer tries various ways of obtaining and combining individual components: different join types, different ways to access the table data, etc. It iterates through the possible plans from those combinations, then picks the one that has the cheapest total cost to execute.

I wrote just over 50 pages on this subject for my book PostgreSQL 9.0 High Performance, which has the longest discussion of query execution available right now. There’s isn’t too much there on how statistics are used beyond what’s shown in the documentation though. Most of it covers all of the various query plan node elements you might run into.

There is a nice and very readable readme.txt somewhere in the src/backend/{planner,optimiser} directory, which discusses the transformations and heuristics used by the planner nicely. Dont expect to understand it after a single read. Read it again. And again. Highly recommended.

UPDATE: it is :
./src/backend/optimizer/README

If you really want to know that, well… you can find it here http://www.pgcon.org/2011/schedule/events/350.en.html and you can read PostgreSQL sources.

On the other hand, you really don’t need to know that for using PostgreSQL or any other database efficiently. The only thing you need to know is that you the statistics should be fresh.

Leave a Reply

Your email address will not be published. Required fields are marked *