Friday 12 June 2009

Diagnosing ASP.NET 100% CPU Utilisation Issues

So you built a web application, tested it and deployed it. People are using it and everything is fine, but then one day your site suddenly becomes totally unavailable. You look at the web server and the w3wp.exe process is using up 100% of the cpu time. Well, here's a way to figure out what page is causing the jam without having to pay for any fancy diagnostic tools.

Log Parser

Firstly get and install the latest version of Log Parser from Microsoft. Currently that would be version 2.2.

http://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en


Finding Out Which Site is the Problem
Are you running multiple sites and aren't sure which site is causing the problem? Well if you aren't already try running each site on a separate application pool. This is also a hint at another way to prevent these kind of performance issues which is managing a web garden in IIS. Hopefully I'll cover that another time. This will mean there is a w3wp process for each site. When one of these processes hits 100% you can use process explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) to find out which pool is causing the issue. The name of the pool will be at the end of the target field when you click on the properties of the task though I'm sure there's a better way of finding that out.

Configuring IIS Logging

To use log parser effectively you need to ensure that you are logging the right things. Go into IIS manager, open the website tab, click on the logging properties button. Now click on the Extended Properties tab and check Time Taken. This is going to allow you to find pages that take a long time to execute in your logs.

Wait

Now you need to wait for it to happen again. Why not have a cup of tea or dig a hole to nowhere? One thing you could do to speed up this occurring again is to use a site crawler to hammer your application. There are plenty of free trials for these kinds of apps floating about the Internet, though it does assume your problem page is causing it's havoc in something executed during page load and not after user interaction.

Using Log Parser

Huzzah, it's at 100% again! Put down that spade and get back to your server. Open up a command prompt in the directory you installed logparser, copy your most recent logfile for the offending site into the logparser directory and paste this into the command window:

logparser "SELECT TOP 10 cs-uri-stem AS Url, MIN(time-taken) as [Min], AVG(time-taken) AS [Avg], max(time-taken) AS [Max], count(time-taken) AS Hits FROM ex*.log WHERE time-taken < class="blsp-spelling-error" id="SPELLING_ERROR_17">Url ORDER BY [Avg] DESC" -rtp:-1

This will give you the top 50 longest taking pages on your site. Next comes a bit of a tedious task. One or more of those pages is likely to be the page causing your problem. The rest are going to be pages that were held up because of that page that was causing the issue, or for some other erroneous reason like the user is trying to load a page with a dying Internet connection their end. In any case copy the results out of the command prompt and into note pad. Now copy each url, starting from the top, into your browser, maybe fiddle a bit with the ui, until you hit the one that maxes out the CPU of your server. This bit would probably be best done on a test server if you have one. When I last had this issue I didn't have to go any further than the second url to find my problem (though I did check them all for prosperity). It'll likely be something simple and stupid that you didn't expect to happen. Mine was a malformed url that was being generated on the site somewhere that omitted a certain parameter I expected to always be there.

Now, back to that hole to nowhere....