Huge database logging of event type rows and ways to optimize it

Posted on

Question :

We have a database that stores events. Events are generated at a 1000 per sec rate. We need to keep these events accessible for some years.
Usual use of these events is selecting some of them from the last 1-2 months.

Each event corresponds to a resource_id (estimated 5000).

Since just keeping all these in one table gets all maintenance and selects quite slow and resource consuming I was thinking on improving it in some ways:

a) separate every resource_id to each table. 5000+ tables. Selects from multiple resource_ids (maximum 100) will need some rewrite and unions or something.
b) separate every day to a different table. unions or something would be required when selects will require data from multiple days.
c) both the above (too extreme I think)
d) partitioning in someway on timestamp index and maybe even resource_id index. Is it worth it? Or “manual” partitioning of above suggestions is better?

The database is PostgreSQL.

Answer :

We have something simular, ~5000 events per seconds. Tables are partitioned by month and we store 5 years of data. That gives us a 60 partitions, works fine. It runs on PostgreSQL version 9.1, works better than any older version when using partitions.

EXPLAIN and EXPLAIN ANALYZE are your best friends to get your queries right.

This is a classic case for rolling partitions (or sliding window partitions). Before redesigning the table in thousands of children tables and rewriting your code please give a try to partitioning, like Erwin already advised.

Some articles:

Leave a Reply

Your email address will not be published. Required fields are marked *