Windows server crash using HP NC373i network adapter

Posted by on April 8, 2010
Filed Under Performance Engineering | Comments Off



 

We were recently performance testing  an Internet banking web site for a major financial institution and encountered a blue screen crash on several Windows 2003 servers.  The solution took some investigation, so I’m posting this information to help others out who may encounter the same issue.

The server configuration was

The platform had 7 of the BL460c servers running custom application server software, as well as 7 IIS web servers running on VMWare, and a SQL Server cluster.   After running high load for a period of time, some of the application servers would get a blue screen error and reboot.  The incidents were random among the servers, but seemed to be at high load levels.  The error in the event viewer after the reboot is listed below.  Nothing was recorded in the event log prior to the reboot, but there was a mini dump file.

Event Viewer Message

Event Type:       Warning            
Event Source:    USER32            

Event Category: None
Event ID:           1076
Date:                2/9/2010

Time:                4:14:06 PM

User:                abc\xxxxxx

Computer:         server4
Description:
The reason supplied by user abc\xxxxx for the last unexpected shutdown of this computer is: System Failure: Stop error                        

Reason Code: 0x805000f
Bug ID:                        

Bugcheck String: 0x100000d1 (0×00000000, 0xd0000002, 0×00000000, 0xb94dba6c)
Comment: 0x100000d1 (0×00000000, 0xd0000002, 0×00000000, 0xb94dba6c)

Data:

0000: 0f 00 05 08   

Mini Dump

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 00000000, memory referenced
Arg2: d0000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: b957da6c, address which referenced memory
Debugging Details:
...
SYMBOL_STACK_INDEX:  4
SYMBOL_NAME:  bxnd52x+9aaa
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME: bxnd52x
IMAGE_NAME:  bxnd52x.sys
DEBUG_FLR_IMAGE_TIMESTAMP:  4b390d70
...

I put the key mini-dump information in bold above. It indicated that the issue was driver related, and that the specific driver was bxnd52x.sys. That is the Broadcom driver for the HP NC373i NIC. So naturally the first thing we did was to upgrade the driver to the latest release.

Fortunately we were able to replicate the crash by running a high load, but we suspected that the issue wasn’t really load related.  Searching the Internet seemed to indicate this as well.  There were several posts on the HP support web site about conflicts between the HP Insight management software and this particular NIC.  The client had other servers with the same configuration, but with a different NIC, and was not seeing the reboot issue.

Upgrading the NIC driver to the latest release, which was only a week old, did not fix the issue.  We sent the mini-dump file to Microsoft for their analysis.  Microsoft responded with the following KB article that fixed our issue http://support.microsoft.com/kb/949234.  The reboot is caused by a race condition in the TCP IP module.  Microsoft can provide a hot fix, but the workaround is to turn off the TCP Chimney feature.  The TCP Chimney feature actually conflicts with our client’s network monitoring software, so they wanted to disable that feature anyway.  This was a convenient and easy solution that resolved the reboot issue.

Below are the instructions from the Microsoft site for turning off the TCP Chimney feature.

  1. Click Start, click Run, type Regedit, and then click OK.
  2. Locate the following registry subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
  3. Double-click the EnableTCPChimney registry entry.
  4. In the Edit DWORD Value dialog box, type 0 in the Value data box, and then click OK.
  5. Double-click the EnableRSS registry entry.
  6. In the Edit DWORD Value dialog box, type 0 in the Value data box, and then click OK.
  7. Double-click the EnableTCPA registry entry.
  8. In the Edit DWORD Value dialog box, type 0 in the Value data box, and then click OK.
  9. Restart the computer.


Comments

Comments are closed.