NUMA is disabled

Background

While troubleshooting a completely unrelated issue recently, I discovered that one of our Hyper-V hosts was not like the others.  There were actually two problems with this particular problem host, unrelated in the end, but possibly related at first glance.

  1. Even though the server in question has two sockets, Windows is only reporting one NUMA node.
  2. SCVMM is incorrectly reporting that there is no available memory on the host

In scope for this post is problem number 1 – NUMA being reported incorrectly to the operating system.

Manifestation of this issue can be observed via two primary methods in Windows Server 2012 R2, as referenced in the (example, not factual) screenshot below.

  • Task Manager > Performance tab > CPU “Change Graph To” displays NUMA nodes as a grayed out un-selectable option
  • Get-VMHostNumaNode shows only one NUMA node (0)

Numa Node Problem Manifestation

Troubleshooting

As I was fairly certain that this was a BIOS or firmware problem and not an operating system problem, my first step was to run the vendor Update Manager software to verify that I am running the latest BIOS and chipset driver versions.  After confirmation, I started to dig around in the BIOS and came across a curious setting buried several layers deep called “Node Interleaving”, and it had been set to Enabled.  I recommend checking with your vendor or manual for the exact location of this setting in the BIOS.  In my case, the path was –

System Utilities screen, select System ConfigurationBIOS/Platform Configuration (RBSU)Performance OptionsAdvanced Performance Tuning Options Node Interleaving

Extensive documentation exists about this setting, but if you don’t know what you’re looking for in the first place, you wouldn’t know where to start.  Bottom line – Setting “Node Interleaving” to Enabled means you are, in effect, disabling NUMA; while setting “Node Interleaving” to Disabled effectively enables NUMA.

Solution

Node Interleaving = Disabled, reboot the host, verify via Task Manager and Get-VMHostNumaNode that the host parent partition sees the correct number of NUMA nodes.

Lessons Learned

Never accept a BIOS setting at face value.
Never assume that enabling a feature equates to a performance enhancement.
Always standardize and document any BIOS settings in your hypervisor build guide, and never put a not-fresh-out-of-the-box server in to production without first combing through all applicable settings.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s