We have the user and system data files on the same disk drive. The ( io_stall_write_ms / ( 1.0 + num_of_writes ) ) is below 2 for the user files but the tempdb files are typically over 400. I see that on a few servers and I’m curious if there is a reason it takes longer to write to tempdb than a regular database data file.
SELECT DISTINCT UPPER(LEFT(mf.physical_name, 1)) AS Directory, ( io_stall_write_ms / ( 1.0 + num_of_writes ) ) as result, io_stall_write_ms, num_of_writes, fs.database_id, fs.[file_id] FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs INNER JOIN sys.master_files AS mf ON fs.database_id = mf.database_id AND fs.[file_id] = mf.[file_id]
Short Answer: Seeing higher IO stalls may or may not be a problem in an of itself. You need to look at more information to suss out if it you have an issue. It does seem a bit high, yes, but are you suffering? If so, it is probably because either your IO system is not handling the load right (because it can’t, because you have everything on one drive or some other reason) or you are doing too much in TempDB (changing the first problem – the IO performance – is probably an easier and more efficient fix, but first determine if you have a problem)
The longer discussion/answer:
There are two questions at play here –
1.) What do I do when I see high IO Stalls?
First off, “high” is in the eye of the beholder. If you were to ask 10 DBAs what “too high” is for IO stalls you’d probably get 2-3 different answers with numbers in them, 5-6 “It depends” answers and one blank stare. My assumption is an average of 400ms is potentially too high here, especially when the other DBs are 2ms or lower for the average stall time.
Regardless of which database is seeing the high stalls you should approach it the same way. An IO stall is what it sounds like… An IO request taking longer than expected.. Stalling. These happen. They happen all the time in a system with resources being shared and finite resources (really all of our systems). They become an issue when the stalls become performance problems or lead to them. So I trust that you are looking here as a proactive part of monitoring or because you were experiencing performance issues that you are troubleshooting. We also don’t want to get lost in just IO stalls. We are looking at a piece of the puzzle and not the big picture. It can be troublesome to just look at wait stats or file stats since SQL was last restarted because you are looking at all time and some maintenance window or heavy load window could skew counters. So make sure you look at the full picture.
But when I suspect I have a disk performance issue or see something off in a query like this, I normally follow a process that looks like:
- Look at the wait statistics on the server. @swasheck shared a great link as a comment in a below answer. This takes you to Paul Randal’s post on looking at and analyzing wait statistics in SQL Server. Go there. What kind of waits are you seeing? Do you see waits related to IO performance (
WRITELOG, etc. ?). If you do this is another indication that you have some IO related performance issues, just like the IO stalls. But it gives you another form of agreement here.
- Look at the IO performance. In particular, look inside of perfmon at the
Physical Disk:Avg Disk Sec/Readand
Avg Sec Disk Sec/Writecounters. These measure your latency. Watch these counters over a period of time saved to a performance log file. What did you see for averages? If you are seeing numbers over 0.020 seconds (20ms) this could be an issue. If you see numbers over 40-50ms avg or higher is a more firm indication of a problem. Also look at your spikes? How high do they go and how long do they last? If you see spikes into the hundreds of ms and they last for tens or scores of seconds or more and/or happen frequently you are more likely to have an issue with your IO performance for your workload.
- Look at your IO setup. What is it? Local disks? SAN? Storage Array? What kind of throughout and IOPs should you see out of this? Is it sufficient for what you are trying to do? You may have undersized your IO for your workload. Don’t just look at your physical spindles, RAID settings, etc. Look at your paths to your disks. Are you pushing everything through a single 1GB link that you are sharing with a lot of other traffic? Can you look at disk performance metrics from the storage’s perspective.
(Note: for this wait stats analysis and perfmon analysis – look at various periods and type of usage. Do you have different usage statistics at night than you do during the day? Batch processing windows? Maintenance windows where you rebuild a lot of indexes? Look at these tools during each of these periods and understand what you are seeing for each)
Another IO performance consideration here –
- You said system DBs and User DBs are shared. Is this production? If so, that isn’t always the best scenario. Are you also sharing log file and data files on the same drives? That isn’t the best scenario either. What else shares this storage? In a world where you are worrying about spindles and raid groups and disks and have to make decisions on who gets the best performing disks, I tend to (as a general rule of thumb.. which aren’t great to have in the DB world but this one tends to hold true) go with my fastest and most dedicated to TempDB (more on that below), then the log files, then the data files. In a world where you have a big pile of disks on a device like a NetApp, Dell Equal Logic or EMC VNX, etc. you don’t need to necessarily worry about separation as much – but you still need to make sure that the total throughput and IOPs your system can deliver (and the switches, cards and cables between you and your disks can deliver) is sufficient for your workload.
2.) What are some reasons TempDB could be higher?
So TempDB is a database and it can have IO stalls like any other database as I just discussed. But what are some reasons TempDB can have higher reads? (not exhaustive, I welcome additions or thoughts in edits, other answers or comments) –
- Because of your code – Are you using TempDB a lot in your code purposefully? A lot of temp tables and table variables created and destroyed? Doing a lot of things in TempDB like this? That isn’t bad or good necessarily, but you might look at that and understand your intentional TempDB usage pattern.
- TempDB is a shared workhorse – TempDB is one database that is used as a temporary space for user defined temporary objects and various work tables and operations used by your entire SQL instance. How many user DBs are there? What kind of workload do you see in general? TempDB is one resource for all things to share.
- Inefficient queries and insufficient memory – Perhaps there are queries that aren’t using indexes tightly enough or are doing large scan and sort operations. Large hash operations, and the memory on the server isn’t sufficient for these. These operations will “spill” to TempDB as worktables behind the scenes. Sometimes this can be avoided with looking at your query plans and indexing or query tuning. Sometimes it happens (more so on warehouse workloads, I find). If you have enough memory, this can help, but these queries can still spill at times. Look a this as well.
- Are you using Read Committed Snapshot Isolation level with a fair number of updates in your system? This can also result in increased TempDB activity.
The point is – TempDB is used in a lot of ways, and it doesn’t surprise me at all to see it as one of your busiest, if not the busiest, database. It also doesn’t surprise me when I see it as having the highest number of and highest average stalls of all databases at a client’s site. It is the nature of its workload sometimes. Looking at some of the things I’ve mentioned here can certainly help you determine if these numbers indicate a problem and if so, how to go deeper in solving it.
TempDB is shared amongst all the databases on the instance. So there can sometimes be contention within TempDB for certain pages: SGAM, GAM, and PFS. In a nutshell, these pages keep track of what’s been used in TempDB so far, and where space is available for new use.
Typically, this is dealt with by adding multiple data files to TempDB. There are a few different philosophies as to the correct number, but all agree you should have more than one.
Here are a few queries to run…
This one will show you how many files TempDB has and where they are located.
-- tempdb layout use tempdb go exec sp_helpfile go
This one will show you how many CPUs and cores you have.
-- cores and hyperthreading select cpu_count, hyperthread_ratio from sys.dm_os_sys_info go
This one will show you how many NUMA nodes and cores per NUMA node you have.
-- numa nodes and schedulers select node_id, online_scheduler_count from sys.dm_os_nodes order by node_id go
This one will show you which pages are experiencing waits in TempDB.
-- see if anything is waiting on tempdb select * from sys.dm_os_waiting_tasks where resource_description like '2:%' go
Here is an article which goes into a bit more depth on the page contention issue.
OK, so now the philosophy part… 🙂
For myself, if I am on an SMP system, I only want as many files as half the total cores.
If I am on a NUMA system, then I only want as many files as cores per NUMA node.
However, I rarely see any improvement for having more that four files for TempDB. So I usually start with four and monitor contention as explained in the article I linked to.
If I continue to see problems, then I would add two more. Check again, add more, and repeat until the contention disappears.