To open PerfMon, just go to the Start Menu, choose Run and type perfmon.
Bottleneck analysis
The most common use of PerfMon is to answer the burning question: why is my system running slow?
With the five performance counters listed below, you can quickly get an overall impression of how healthy a system is - and where the problems are, if they exist. The idea here is to pick counters that will be at low or zero values when the system is healthy, and at high values when something is overloaded. A 'perfectly healthy' system would show all counters flatlined at zero. (Perfection is unattainable, so you'll probably never see all of these counters flatlined at zero in real life. The CPU will almost always have a few items in queue.)
- Processor utilization
-
- System\Processor Queue Length - number of threads queued and waiting for time on the CPU. Divide this by the number of CPUs in the system. If the answer is less than 10, the system is most likely running well.
- Memory utilization
-
- Memory\Pages Input/Sec - The best indicator of whether you are memory-bound, this counter shows the rate at which pages are read from disk to resolve hard page faults. In other words, the number of times the system was forced to retreive something from disk that should have been in RAM. Occasional spikes are fine, but this should generally flatline at zero.
- Disk Utilization
-
- PhysicalDisk\Current Disk Queue Length\driveletter - this is probably the single most valuable counter to watch. It shows how many read or write requests are waiting to execute to the disk. For single disks, it should idle at 2-3 or lower, with occasional spikes being okay. For RAID arrays, divide by the number of active spindles in the array; again try for 2-3 or lower. Because a shortage of RAM will tend to beat on the disk, look closely at the Memory\Pages Input/Sec counter if disk queue lengths are high.
- Network Utilization
-
- Network Interface\Output Queue Length\nic name - is the number of packets in queue waiting to be sent. If there is a sustained average of more than two packets in queue, you should be looking to resolve a network bottleneck.
- Network Interface\Packets Received Errors\nic name - packet errors that kept the TCP/IP stack from delivering packets to higher layers. This value should stay low.
Pay close attention to the scale column! Perfmon attempts to automatically pick a scale that will magnify or reduce the counter enough to produce a meaningful line on the graph ... but it doesn't always get it right. As an example, Perfmon often chooses to multiply Disk Queue Length by 100. So, you might think the disk queue length is sustained at 10 (bad!) when in fact it's really at 1 (good). If you're not sure, highlight the counter in the lower pane, and watch the Last and Average values just below the graph. In the screenshot below, I modified all of the counters to a scale value of 1.0, then changed the graph's vertical axis to go from 0-10.
To change graph properties (like scale and vertical axis as discussed above), rightclick the graph and choose Properties. There are a number of things to customize here ... fiddle with it until you have a graph that looks good to you.
To get a more detailed explanation of any counter, rightclick anywhere in the perfmon graph and choose Add Counters. Select the counter and object that you are curious about, and click the Explain button.
This screenshot shows a very lightly-loaded XP system, with the Memory\Pages Input/Sec counter highlighted:
All we see here is the Proccessor Queue Length hovering between 1 and 4, and two short spikes of Pages Input/Sec. All other counters are flatlined at zero, which is easy to check by highlighting each of them and watching the values bar underneath the graph. This is a happy system - no problems here!
But if we saw any of the above counters averaging more than 2-4 for long periods of time (except Processor Queue Length: don't worry unless it's above 10 for long lengths of time), we'd be able to conclude that there was a problem with that subsystem. We could then drill down using more detailed counters to see exactly what was causing that subsystem to be overloaded. More detailed analysis is beyond the scope of this article, but if there's enough interest I could do a second article on that. Leave a comment if you're interested!
General activity counters
Well, the system is healthy - and that's good ... but how hard is it working? Is the processor workin' hard, or hardly workin'? How much RAM is in use, how many bytes are being written to or read from the disk or network? The following counters are a good overview of general activity of the system.
- Processor utilization
-
- Processor\% Processor Time\_Total - just a handy idea of how 'loaded' the CPU is at any given time. Don't confuse 100% processor utilization with a slow system though - processor queue length, mentioned above, is much better at determining this.
- Memory utilization
-
- Process\Working Set\_Total (or per specific process) - this basically shows how much memory is in the working set, or currently allocated RAM.
- Memory\Available MBytes - amount of free RAM available to be used by new processes.
- Disk Utilization
-
- PhysicalDisk\Bytes/sec\_Total (or per process) - shows the number of bytes per second being written to or read from the disk.
- Network Utilization
-
- Network Interface\Bytes Total/Sec\nic name - Measures the number of bytes sent or received.
And ... that's all for now. Hopefully this quick show-and-tell has given you enough information to use PerfMon more usefully in the future!
32 comments:
First of all, it's great to have you back. I really enjoy coming to the site. If you would be up to writing a second more detailed article on this topic, I for one would appreciate the effort. Thanks.
Thanks for your time spent on writing this good piece!
Cheers from Kristian
Umm, are you okay? I keep on coming back thinking my rss reader is bad, but nope, still no updates :-(
quxxo what happen???
Good stuff. One thing this is really useful for is determining if the server can be a virtual machine. I know disk I/O will kill a VM host, but I'm not too sure on what exact numbers would constitute a good candidate for a VM.
Hi,
I'd like to also see a followup to this article. It was really good stuff. Thank you
Great article. Just like Jimmy, I'd love to see a follow up article.
Very helpful article, thanks a lot.
This is the best article for perfmon I've seen yet! I would be very interested in seeing the second article. When can you have it posted??? ;-)
Great article, I'd love to see a second!
This is very interesting. Would love to see more details into this..
Got here via your post in Ars. Thanks much for the article; I maintain a Windows network using a finicky custom program, and this is absolute gold.
This is a great article. Thanks! I too would be interested in more detail.
Introducing a new public forum for Perfmon.
http://social.technet.microsoft.com/Forums/en-US/perfmon/threads/
thanks.
Any idea why I see max values for something like (java)\% Processor Time at 159.688? Average is 35.668 with a stddev of 29.234 but anyone looking at the max will be like "whaa??"
Regards.
this is wicked man, thank you very much for your information.
thank you very informative...
Take a look at SmartMon (www.perfmonanalysis.com) when its time to analyze the Perfmon data that has been collected.
I had no clue this existed thanks!
This is the best article I've read in a week of searching. Thanks!!!!
Thanks for such a great article. It is very helpful....
Thanks for the detailed info. It really helps.
Regards,
Sohail Chaudhry
It really helps..Thank you
Would like to see more articles like this...
A timeless classic.
Thank you.
Great advice - Thanks
Still valid and still useful, thanks for taking the time to write this.
James
I use it every time I generate performance reports. Thank you very much for such a nice article.!!
Great Article! Very Useful....Thank you!
Brilliant, To the point and simple to understand, just how I need it. Thanks you this was very helpful to me .
Excellent article.
I wonder is it useful to run more than one instances of perfmon to split up counters which are a percentage and those that are a discrete number?
This might help to make it easier to visualize the two types of value and not have large discrete values (say number MBytes of HDD available) obscure percentage values due to perfmon automatically picking the scale.
Wow, the Current Disk Queue Length really is the most valuable counter.
For anyone who really wants their socks blown off, download the DiskLED utility and plug in Current Disk Queue Length, and get a REAL load meter for your I/O-bound system!!! Wow!
Great blog.I think they will get their salary depends upon their strength,experience.