Which performance counters can you look at on a SQL Server Instance to determine it’s performance and over all health?

Posted on

Question :

I am a student from Fontys University in Eindhoven, and I am currently carrying out a series of interviews to help with the development of a SQL Server tool and I would like to get feedback from the experts in the field.

One of my questions is:

Which performance counters can you look at on a SQL Server Instance to determine it’s performance and overall health?

Particularly I am interested in the threshold values when good becomes bad.

Jamil Young
Eindhoven
The Netherlands

Answer :

Here’s my Perfmon tutorial for SQL Server: http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/

For more counters and thresholds, here’s a poster we did when I was at Quest: http://www.quest.com/documents/landing.aspx?id=11635

This is a big topic with plenty of material available with a spot of Googling. As a starting point, these are the counters I tend to look at first:

Processor – % Processor Time

System – Processor Queue Length

You’ll probably get a different target value for CPU usage from every DBA you ask. SQL Server licences are expensive, so on the one hand you want to maximise the usage of CPUs while on the other hand you don’t want to compromise availability. In an ideal world with well understood workloads, you might target 70% usage, warn at 80-90%, alert at 90%+. Back in the real world with a workload that peaks and troughs, you might be more comfortable targeting 50-60% average.

Memory – Available MBytes

Paging File – % Usage

With a dedicated SQL Server, depending on the RAM installed, less than 100-200mb of available memory may indicate starvation and a risk of the OS paging. In general, we don’t want to see much page file activity so I’d be investigating if % Usage was greater than 2% and concerned if it hit 5%

Buffer Manager – Buffer cache hit ratio

Buffer Manager – Page life expectancy

Both of these counters are better considered against an established base line for a server. Ideally, we’d like cache hit ratio as close as possible to 100% and a PLE running in to thousands of seconds. Pay attention when they swing away from historic averages.

SQL Statistics – Batch Requests/sec

SQL Statistics – Compilations/sec

SQL Statistics – Recompilations/sec

Requests/sec is a great relative measure for how “busy” a server is. High compilation/recompilation values may indicate CPU cycles being wasted on query compilation.

Physical Disk – Avg. Disk sec/Read

Physical Disk – Avg. Disk sec/Write

Physical Disk – Disk Reads/sec

Physical Disk – Disk Writes/sec

A rough guideline for a properly configured IO system is <5ms (ideally 1ms) for log drives, <20ms (ideally <10ms) for data. Reads/writes per second should be considered against the known limit for the drive(s) i.e. if you have capacity for 1000 IOPS, I’d be evaluating upgrade options when the average IOPS reached 750.

Leave a Reply

Your email address will not be published. Required fields are marked *