We have a database that stores events. Events are generated at a 1000 per sec rate. We need to keep these events accessible for some years.
Usual use of these events is selecting some of them from the last 1-2 months.
Each event corresponds to a resource_id (estimated 5000).
Since just keeping all these in one table gets all maintenance and selects quite slow and resource consuming I was thinking on improving it in some ways:
a) separate every resource_id to each table. 5000+ tables. Selects from multiple resource_ids (maximum 100) will need some rewrite and unions or something.
b) separate every day to a different table. unions or something would be required when selects will require data from multiple days.
c) both the above (too extreme I think)
d) partitioning in someway on timestamp index and maybe even resource_id index. Is it worth it? Or “manual” partitioning of above suggestions is better?
The database is PostgreSQL.
We have something simular, ~5000 events per seconds. Tables are partitioned by month and we store 5 years of data. That gives us a 60 partitions, works fine. It runs on PostgreSQL version 9.1, works better than any older version when using partitions.
EXPLAIN and EXPLAIN ANALYZE are your best friends to get your queries right.
This is a classic case for rolling partitions (or sliding window partitions). Before redesigning the table in thousands of children tables and rewriting your code please give a try to partitioning, like Erwin already advised.
- Sliding Window Table Partitioning and Automating Sliding Window Maintenance – rolling partitions explained in SQL Server;
- Table Partitioning Sliding Window Case – MSDN blog, SQL Server flavor;
- Rolling Time-based Partitions – MySQL flavor;
- Self Managing Partition Example – PostgreSQL flavor;
- Dynamic Daily Table Partitions With Postgres – PostgreSQL again (actually pearl+pgsql);