Monday, 30 September 2013

Debugging Stop 0xC1 - Sloppy Bytes and Special Pool

Another memory corruption related bugcheck, but this time, it related to the use of the Special Pool option available within Driver Verifier. MSDN Blogs (NT Debugging) wrote a excellent article explaining Special Pool, and how it works which I've added to a blog post for this month, so I would highly recommend checking that article out before reading this blog post to have a full understanding of Special Pool and Slop Bytes.

 The first parameter is what we are most interested in, this the address which was attempting to freed, but was picked up by Driver Verifier due to the single bit corruption within the Slop Bytes region.

Firstly, let's examine the pool page in which the address belonged to with the !pool extension.

We can see the pool page is obviously corrupted, but let's investigate further with the !poolval extension on the suggested address provided by the dump file.

Let's look even further with the _POOL_HEADER data structure:

Okay, we can clearly see that the Previous Size for the Pool Header is wrong, it should have been 0 and the PoolType field indicates that the page was Non Paged Pool. 0 = Non-Paged Pool and 1 = Paged Pool. 

So what happened and what do Slop Bytes and Special Pool have to do with this bugcheck?

This bugcheck indicates that a driver has caused a single bit form of memory corruption within the Slop Bytes region of a pool page. The Special Pool option of Driver Verifier gives each driver it's own individual pool page, which is then split into Slop Bytes and then the driver's pool allocation. The next virtual page is a Guard Page to detect Buffer Overflows, although since the problem seems to have occurred within the Slop Bytes, then the error isn't to do with a Buffer Overflow, but a different form of memory corruption. The Slop Bytes are a repeating pattern, and if this pattern is disrupted, then the system will bugcheck with the same error which is present at the moment.

Debugging Stop 0x4E - Bad Share Count

Okay, each time I come across a Stop 0x4E, it always seems to point to memory corruption and the last instruction being the MiBadShareCount line. By the way, I've learned that Mi prefix most likely means Internal Memory Management/Manager.

I'll take into consideration, that you would know that are different fields depending upon the PFN Data Structure being used by the operating system, and from my understanding, the above example applies to PFNs being used for a page part of a working set for a process.

For those, who do not know what working set is, it is simply the number of virtual pages present in physical memory for one given process. It is important to remember that software uses virtual memory pages, and these are simply translated into physical addresses by the MMU.

On a side note, I'm still trying to find out what page states the numbers reference, so if anyone knows then please post a comment on this post. Anyway, back to the point, we can see with this bugcheck that either a PFN or PTE data structure has become corrupt. The reason being so, is due to a bad share count and this what I would like to briefly discuss in this blog post.

The Share Count is the number of PTE's which refer to that page (which the PFN Data Structure represents); for page table pages it is the number of valid or transition PTE's in the page table. If the Share Count is above 0, then it is considered not eligible to be removed from memory or the working set for that matter. 

We could view the PFN with the !pfn extension, although, this unfortunately wasn't a viable option for this dump file, and therefore I'm guessing judging by the call stack, that the share count dropped below 0 and the page was removed from memory. We can see this clearly, with the DeleteAddressesInWorkingSet and CleanProcessAddressSpace, both of which are part of the Memory Manager API. The Data Structure probably wasn't updated or became corrupt, so it couldn't be updated to accommodate this change.

Wednesday, 18 September 2013

Using .chain and .help

This is going to be a very simple blog post, listing how to view all the DLL extensions you have loaded and how to list all the extensions available from that DLL. Firstly, let's start with the .chain command which part of WinDbg.

The .chain simply lists all the loaded DLLs for that dump file, when clicking the link (in blue), we can see all the available extensions from that DLL.

As an example, I've used the CMKD.dll:

We can gain further information about those listed commands by typing !

The .help command lists the help information for all the commands for WindDbg, clicking the links will produce help specific to all the commands listed under that letter.

Here's U as an example:

Thursday, 12 September 2013

Debugging Stop 0x74

A Stop 0x74 is usually related to registry corruption, especially the System hive, whereby all the system configuration information is stored. Although, this bugcheck can be related to the wrong file permissions set for a registry file, and RAM corruption.

The only useful parameter and information I found I could gather from this dump file, was by using the !error extension with the fourth parameter, which should produce something like this:

"Error code: (NTSTATUS) 0xc000014c (3221225804) - {The Registry Is Corrupt}  The structure of one of the files that contains Registry data is corrupt, or the image of the file in memory is corrupt, or the file could not be recovered because the alternate copy or log was absent or corrupt."

Debugging Stop 0xAB

Note: Before reading this article, please take into consideration this is a old dump file, and the cause of the crash was due to a bug in Windows 8, which Microsoft have now released a fix for, which can be downloaded here -

We can see this dump file originates from a Windows 8 system, by using the vertarget command we can gain some useful operating system information.

Another side note, I would like to add with this bugcheck, is make sure you also update your graphics card drivers and the list of installed of Windows Updates/Hot Fixes can be found within the SystemInfo.txt file.

I would like to explain briefly, what Session Space is, and how it related to this bugcheck.

This address space is area of memory assigned to one particular session, and is shared by all the processes within that particular session space. A session is simply all the processes running within that user's logon. Each session is given a Session ID which helps identify each session. More Information -

 Any Session ID above 0 is the user logon session. The Session 0 is where all the NT processes and services are started. We can view the Session ID in Task Manager, by selecting the View setting and then Selecting Columns option.

I can now see all the processes running under my logon session and user name.

Going back to our bugcheck, we now understand the purpose of Session ID and what session the pool was leaking in, the fourth parameter shows how many pool allocations were leaking. The pool leak was caused by the win32k.sys not releasing it's pool allocations when logging off.

The !pooltag can be used to tell us who the pool allocation belonged to:

We can that it belonged to win32k.sys (part of the Windows subsystem), and that particular allocation was related to the Graphical Device Interface.

Tuesday, 10 September 2013

CMKD.DLL - !stack, !ptelist, !packet, !kvas,

Here's a another custom made .DLL, I've actually had this library for a while now, but I've only just remembered it, so you may see this used in the future too.

CodeMachine Debugger Extension DLL (CMKD.dll)

If you don't understand how to load a custom .DLL, please read this post here - 
Loading Custom Debugger Extensions - !load and !dpx

Loading Custom Debugger Extensions - !load and !dpx

Andrew Richards has developed a really custom .DLL, with some really nice debugger extensions for us debuggers to test out and use with dump files. The only one. I've used so far is the !dpx extension, which dumps all the useful information from a raw stack. This extension is going to replace the !thread and then dps method.

Firstly, you need to download the .DLL from SkyDrive, and unzip the folder. Once you have download and unzipped the folder, navigate to your appropriate operating system architecture, either x86 or x64, and then copy the .DLL. I'm not sure what the other folders contain as I haven't watched the Defrag Tools video yet.

Once copied, you will need to paste the .DLL into this folder (follow these instructions below):

C:\ or the partition you have Windows installed on > Program Files > Windows Kits > 8.0 (dependent upon version) > Debuggers > x86 (or x64)

Paste the file into that folder, and then accept the UAC prompt.

You will then need to open a dump file, and use the !load extension with the .DLL name (doesn't require file extension) to load the .DLL file. You will need to do this each time you open a dump file, but you shouldn't have to, once that .DLL has been loaded for that dump file (well, I didn't have to anyway). The !unload extension will unload the dump file.

! will list all the extensions contained with the .DLL, and explain what each extension does.

The DLL once loaded, also slightly improves the other stack unwind commands like k for instance, the stack frame numbers are also listed.

Monday, 9 September 2013

[Link] Understanding Special Pool

There's a nice article explaining the arrangement of special pool on the NT Debugging website, and I thought I would share it with my readers - Understanding Pool Corruption Part 2 – Special Pool for Buffer Overruns

Debugging Stop 0x1A - Corrupt Image Relocation Table

This blog is most likely, going to be more of me attempting to explain relocation and the relocation fix up table.

The first parameter indicates that image relocation fix-up table has become corrupt, the image relocation table is a table of pointers for a program, which are used to assign memory addresses to parts of the program. Each pointer is called a fix-up. Pointers are basically used in programming, to assign or use memory addresses in programs.

The MSDN documentation points out, that this issue is more hardware related, and therefore the only valid reasons I could think of are - corrupt memory addresses are being assigned or maybe the MMU wasn't translating virtual pages to physical pages, resulting in invalid memory addresses.

Debugging Stop 0xFE

I thought I'll debug a Stop 0xFE tonight, which is usually related to problems with USB device drivers. I tend to stay away from Stop 0xFE's, as a result of my experience with them, when I first started debugging, but I'm going to start taking them on again.

We need to firstly check what the parameters correspond to, by checking the MSDN site and checking the references for this bugcheck. 

 Checking the documentation provided, we can see that the bugcheck happened, as a result of the driver waiting for a suspend-port request to complete.

With this bugcheck, I would check the USB driver timestamps with the lmtsm command, and then update the drivers.

I would additionally recommend turning off USB Port Suspension.