I am a student from Fontys University in Eindhoven, and I am currently carrying out a series of interviews to help with the development of a SQL Server tool and I would like to get feedback from the experts in the field.
One of my questions is:
Which performance counters can you look at on a SQL Server Instance to determine it’s performance and overall health?
Particularly I am interested in the threshold values when good becomes bad.
Here’s my Perfmon tutorial for SQL Server: http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/
For more counters and thresholds, here’s a poster we did when I was at Quest: http://www.quest.com/documents/landing.aspx?id=11635
This is a big topic with plenty of material available with a spot of Googling. As a starting point, these are the counters I tend to look at first:
Processor – % Processor Time
System – Processor Queue Length
You’ll probably get a different target value for CPU usage from every DBA you ask. SQL Server licences are expensive, so on the one hand you want to maximise the usage of CPUs while on the other hand you don’t want to compromise availability. In an ideal world with well understood workloads, you might target 70% usage, warn at 80-90%, alert at 90%+. Back in the real world with a workload that peaks and troughs, you might be more comfortable targeting 50-60% average.
Memory – Available MBytes
Paging File – % Usage
With a dedicated SQL Server, depending on the RAM installed, less than 100-200mb of available memory may indicate starvation and a risk of the OS paging. In general, we don’t want to see much page file activity so I’d be investigating if % Usage was greater than 2% and concerned if it hit 5%
Buffer Manager – Buffer cache hit ratio
Buffer Manager – Page life expectancy
Both of these counters are better considered against an established base line for a server. Ideally, we’d like cache hit ratio as close as possible to 100% and a PLE running in to thousands of seconds. Pay attention when they swing away from historic averages.
SQL Statistics – Batch Requests/sec
SQL Statistics – Compilations/sec
SQL Statistics – Recompilations/sec
Requests/sec is a great relative measure for how “busy” a server is. High compilation/recompilation values may indicate CPU cycles being wasted on query compilation.
Physical Disk – Avg. Disk sec/Read
Physical Disk – Avg. Disk sec/Write
Physical Disk – Disk Reads/sec
Physical Disk – Disk Writes/sec
A rough guideline for a properly configured IO system is <5ms (ideally 1ms) for log drives, <20ms (ideally <10ms) for data. Reads/writes per second should be considered against the known limit for the drive(s) i.e. if you have capacity for 1000 IOPS, I’d be evaluating upgrade options when the average IOPS reached 750.