I am getting frequent high CPU usage alerts from production DB server. As I want to investigate more into this by my own, I have collected the running DB queries for that high CPU usage time period.
SELECT * FROM pg_stat_activity;
From the collected data, observed that there is one frequently accessed table with UPDATE operation. From further investigation I came to know that the table is the most frequently accessed one in the DB and there is nothing possibly resource intensive.
Following are the details I got from EXPLAIN for that query,
QUERY PLAN ------------------------------------------------------------------------------- Update on foo_table (cost=0.00..1271.14 rows=1 width=55) -> Seq Scan on foo_table (cost=0.00..1271.14 rows=1 width=55) Filter: (id = 2::bigint) (3 rows)
So far the only certain data I have is the time period when this high CPU usage triggering operation is happening. Though unable to figure out which query is taking too much time to process.
I have tried following queries, I couldn’t find convincing stats.
SELECT relname, idx_tup_fetch + seq_tup_read as TotalReads from pg_stat_all_tables WHERE idx_tup_fetch + seq_tup_read != 0 order by TotalReads; SELECT * FROM pg_stat_all_tables ORDER BY seq_scan DESC;
I am helpless as there is no way for me to figure out what is happening, like is it normal because the work load or is it indexing related issue or query optimization would solve it,etc.
Size: 150+ GB
CPU: 8 core
RAM: 16 GB
Storage: SSD with no swap.
The problem was with the indexing of few tables. I was able to identify those tables with Read stats by performing the following query.
SELECT relname, idx_tup_fetch + seq_tup_read as TotalReads from pg_stat_all_tables WHERE idx_tup_fetch + seq_tup_read != 0 order by TotalReads desc LIMIT 10;
First, there have been a lot of performance improvements since 9.2. Monitoring tools have also improved a lot since then. It would probably make more sense to put time into upgrading rather than investigating performance problems on a system which is out of support.
Next, do you actually have a problem? Your database making use of the CPUs you bought for it to use is not inherently a problem. That some (unnamed) monitoring tool sends you an alert also doesn’t mean you have a problem, you might just have a broken or poorly configured monitoring tool. How high is the CPU usage? Does it tell you what the process titles of the highest-use processes are? Are your users complaining about performance?
For the specific query you captured, is there an index on “foo_table” which has “id” as the leading column in the index? Columns which someone decided to name “id” should most likely be indexed.
If you are not sure that your manual inspection of pg_stat_activity has revealed the whole picture, then you can use the auto_explain extension with “auto_explain.log_min_duration” to capture specific executions which are slow. This is a nice complement to pg_stat_statement, which tells you more about each query in general, rather than specific executions of them.
You could look at pg_stat_statements. It has to be enabled in
shared_preload_libraries, but will give you an excellent view on which statements consume the most resources. It doesn’t directly track CPU time, but from looking at the tracked total and I/O times it’s usually possible to get a good estimate.