Saturday, February 25, 2017

Security coding in Windows Kernel (Anti-malware) - 1

Hi folks,

Firstly I would like to thank all the viewers of my blog. I have 964 hits so far. I got 7 feedback via email so far (all productive). Thanks especially to Sysinternals forum, Stack overflow and Linkedin. I will try to incorporate most of the things requested in the feedback. So keep sending your comments to or comment to the posts.

Today's post is more on windows security. It is especially useful for programmers who write anti-malware applications (beginners) or who want to understand the internals of some of them. I found it really difficult to find an appropriate title but finally gathered few words.

While I could have written the code in 'C' or 'C++' I wanted to thank Hutchison and Four-F and I didn't find a better way to do that. The "innovation" around reduction of "development-kit" size is quite appreciable. Secondly Four-F kit is simply incredible. It also includes undoc headers. Together MASM (and assembly language know-how) and Four-F's KMD kit is a must for any security professional especially System Hacking (coding). It also would be the first steps to going for other versions of "ASM" that provide 64-bit support.

Although I haven't studied differences between "undocumented API" available in the Four-F kit and Alex's NDK, completely, I still think it would be a value-add to convert the NDK to ".INC". Something I am working on... I know these are almost history but isn't that also important for Malware Analysts :) - they will agree.

So today I wanted to talk about three very important and elegant SPI (SPI = System programming interface. SPI < > API), that windows has been providing since Windows 2000. I will put focus on Windows 2000 (like most other posts of this blog) but it is still applicable to later versions and I will try to add enough information about changes in newer OS version.

So the 3 functions that I would prefer to call as "SPI" but are categorized under "Driver support routines - process support) are...

PsSetCreateProcessNotifyRoutine - Monitors new process creation.
PsSetCreateThreadNotifyRoutine - Monitors new thread creation.
PsSetLoadImageNotifyRoutine - Monitors mapping of executable images (before execution) to virtual memory.

Here is my ASM code (skeleton process monitor)...

.model flat,stdcall
option casemap:none

include \masm32\include\w2k\
include \masm32\include\w2k\
include \masm32\include\w2k\

includelib \masm32\lib\w2k\ntoskrnl.lib

NotifyMe proto :HANDLE,:HANDLE,:BYTE

pcreate db "Process is created",0
pexit db "Process is exiting",0
snotify db "Success: Notification routine registered successfully!",0
fnotify db "Error: Unable to register notification routine!",0
format db "%s:%d",0



push 0
push offset NotifyMe
call PsSetCreateProcessNotifyRoutine

push offset snotify
call DbgPrint
push offset fnotify
call DbgPrint

DriverEntry ENDP

NotifyMe proc ParentId:DWORD,ProcessId:DWORD,Create:BYTE
cmp Create,FALSE
jz lexit
push ProcessId
push offset pcreate
push offset format
call DbgPrint
jmp getout
push ProcessId
push offset pexit
push offset format
call DbgPrint
NotifyMe endp

END DriverEntry

Use the above code with MSDN documentation and it should be fine.  Here is the gist anyway - Just register a notification routine and you will be notified every time a process is created at least with 1 thread (at least one thread will be there in most cases). There are new routines in later OS that wasn't there in Windows 2000. For example, in W2K it was required that the driver remains loaded until system is shutdown. Now we have options to Un-register the notify routines in case of thread and image load notifications. We are able to block the process creation too (documented functions) in later versions. With the protection of critical kernel data structures in later windows versions, such routines and facilities are important because earlier (say W2K times) we had to do other tricks to get the same task done. There are some "C" sample codes available online and especially in Malware related books. I didn't find an ASM sample (may be I didn't look enough). Anyway I wrote a few samples on above SPI. So it is quite simple.  Below is the output, for the above sample process notification code, using Debug viewer utility.

That's it for today! :)

Sunday, May 15, 2016

Windows Internals [Process II - Code]

Before I begin, please comment on the posts, so I know what is wrong and what is right with the posts. Please consider that I am not charging you for this information and your comments will be a motivation.

If you haven't read the part I of this, I would recommend you to do so now. And if you have then let us go directly to the code.

        char *eproc;
        long cpid;

         eproc = (char*)PsGetCurrentProcess();
         cpid = *((long*)(eproc + 156));
                DbgPrint("%d : %s", *((long*)(eproc + 156)),eproc + 508);
                eproc =    (char*)(((char*)(((LIST_ENTRY*)(eproc + 160))->Flink)) - 160);
                if( cpid == *((long*)(eproc + 156)) )break;
    return STATUS_SUCCESS;

A small piece of code isn't it? :)   So what are we seeing here. First we use a documented function that is accessible to driver writers easily to get a pointer to the structure EPROCESS (PsGetCurrentProcess()). Remember that I am using the Windows 2000 driver kit to compile and every bit of compatibility that I am explaining and assuring is tested. At this point I am keeping it bare metal so I am only going to use pointers and simple data types to manipulate the kernel info in the algorithm not high-level structures.  So if you add 156 bytes from the start of EPROCESS you get the process ID which is declared long. long takes 4 bytes. So now that we dereference that memory location, we can get the PID of a process. Since this is driver entry called in arbitrary context , I assume mostly it will show the current process as "System". You can see that in the part I of this article. It is system that is the first process in the list. Next is the grand loop. Inside this, I use the existing pointer to EPROCESS and go to offset 508 to get the process name. Ok what is the next line that is a mess? Couple of things are done here. So, after the PID the next information that is stored in EPROCESS is LIST_ENTRY. As driver writers would know this is used for managing a doubly linked list and driver writers often use this. It is a self-referential structure with a forward and a backward link. What do they store? As you can imagine they store pointer to the next LIST_ENTRY. And the next important question is not, what the next ENTRY points to but where it is contained. It is contained in the next EPROCESS structure or structure that is created to inform about the next process. So if we jump 160 bytes from start of EPROCESS use the value contained in that address, go to that address and subtract 160 from it, we get starting address of next EPROCESS. I have done a lot of casting to char*. We need to cast the so extracted LIST_ENTRY address before subtracting 160. Otherwise it will subtract LIST_ENTRY sized value from that (pointer arithmetic). A lot of that code can be simplified if we introduce, the EPROCESS structure itself and cast some of these addresses to be of type EPROCESS.

Therefore you can re-write that to the below having declared structures necessarily...

typedef struct EPROC
        char blah[156]; //At this point we are not worried about what is in these 156 bytes
        long pid;
        LIST_ENTRY *Flink,*Blink;


         __PEPROCESS eproc;
        long pid;
        eproc = (__PEPROCESS)PsGetCurrentProcess();
        cpid = eproc->pid;

              DbgPrint("%d:%s",eproc->pid,(char*)eproc + 508);

              eproc = (__PEPROCESS) ((char*)eproc->Flink - 160);
              if(eproc->pid == cpid) break;

But would we ever stop with that? For starters, you could manipulate these pointers and change their values causing certain havoc. This is what root kits do like hiding the process. Similar structures exist for everything else managed by the kernel - security related structures, networking related, file system related.. imagine what power we have now, having understood how to manipulate the low level structures? There are so many websites and blogs that talk about these. One is a blog that I follow myself. Joanna's Invisible things. She is definitely one of my most favorite Gray Hats. :)

Of course you cannot easily do that now, since Windows has kernel patch guard etc. And if you didn't know already, similar techniques are used in the other operating systems and no system is immune to these attacks. For there is always a way if there is the will... So what do you have in your mind?

Windows Internals [Process I - Theory]

Before I begin, please comment on the posts, so I know what is wrong and what is right with the posts. Please consider that I am not charging you for this information and your comments will be a motivation.

In this post I will show you a bit about windows process internals. I am writing this with few assumptions - you know "C" programming, you know about windows driver programming basics and that you also have the book Inside Windows 2000 (3rd ed) by David Solomon and Mark Russinovich (or equivalent). Well yes, do not be surprised that I am discussing about windows 2000 here when we already have a windows 10 in the market. Some great minds have said - "Keep it simple, but not simpler". So let us keep the problem simple and I will also ensure it isn't made quite simpler that the problem definition itself is confined with wrong constraints. But any windows internals book would do. If you want to test this code you will also need Windows 2000 (itself) or modify some values that I will tell you later. You will need the driver kit of course. An understanding of Windows API and C is a plus.

Most of you would only need to just know that like any other operating system, windows also uses several structures to maintain processes. As I understand from the great minds who wrote the internals book (my role models in fact) , windows do not contain "Tasks" unlike *NIX.

However the basic strategy is more of less same for the algorithms used in these operating systems. In a nutshell (thinking as a developer), you need a structure that holds all information about a process. This can be a very complex structure indeed. It can have a lot of nested structures within but as we know, let us keep it simple. So this will contain information about say the name of the process, the ID assigned to the process (internally called client id). It will contain few more details like when the process started or how much time elapsed since the process started, information about the threads that it contains etc. As you see the moment we talk about a significant entity like "thread" you realize that a thread itself would contain some information related to it. This might be in another structure. Therefore you will have some nested structure within this structure for process. And now when you take the internals book that I said above and go to the page where the authors listed the structure of process, you may get bewildered :) There is a lot of information stored in the structure and in that book it is printed on 3 - 4 pages. Imagine how much the Kernel developers of Windows would have thought about it. As the great Niklaus Wirth put it Program = DataStructure + Algorithm.

I always like to see what is it that we are dealing with, so the above image shows the output of our effort. It shows information that DbgPrint() in a driver code gives. Of course you would recognize these are process names with their IDs.

Ok a little bit theory on the basics. So Processes in windows is basically managed primarily with two important structures. EPROCESS and KPROCESS. A kernel debugger (along with symbols) can show you the information. Now that these things are documented online you can always look at that. Finally you can always check the internals book for the information. KPROCESS keeps a lot of very low level aspects of a process like scheduling where as Executive structure (EPROCESS) keeps a high-level view of process information. Even much of the operating system deals with and relies on the EPROCESS for their programming chores. Usually programmers (system) won't need to access these structures directly. Even driver writers won't need to worry about them. But that is internals right?  So EPROCESS contains some information and for the rest it "contains" KPROCESS within.

Security professionals would know about this because these structures were exploited in the past to hide process and keep other activities stealth (rootkit). One of the areas I am trying to specialize in is rootkits. Today we don't get to see them because of the strict "code integrity" that operating systems like Windows have introduced (since Vista actually) and every version there is something new. Digital Signature, patch guard etc. Perhaps in another post I can show you some of those hackish code. It is indeed very interesting. You may also want to know (as I said in Native api related post), these structures were used by some software to protect their code. Antivirus for example used to use these structures, over write some data with pointers that work like a "detour", "trampoline" and so on.

So what is the task we have in hand and what are we going to learn from it? We will try to access this EPROCESS structure in the kernel (not user mode so you need to write a driver), we will use it to enumerate the process as shown above. We thereby learn how process information is handled internally (to some extent).

Thursday, March 31, 2016

Introduction to Native (aka Undocumented) API

Hi all,

My second post to my new blog and I am even more excited than I was when I posted the first one. That is because this week I was looking at some really "hackish" stuff. Anyway, if you are here then perhaps you already have heard of the so called "undocumented api" or "native api". If you are checking this post as part of my updates sent to forums and sites like linkedin then perhaps you may not know have heard of this so called undocumented API or native API.  So take a deep breath... this is going to be a long read...

This post is for beginners who are developers, who have used C, C++ and Windows API already and who wants to dive into the "hackish" dimensions of Windows. Windows gives a mammoth API for programmers who develop with C/C++/Assembly language. Windows also uses this API. However Windows OS also have some "hidden" interfaces that it uses apart from the Windows (public) API. I assume you are aware of this documented API aka Windows API.

So today it is a misnomer when we it as  "undocumented API". Most of these native APIs are documented now.. Starting Windows NT days till today, there were some belligerent and talented hackers who did surgery on Windows OS and studied the internal facilities (code) that the operating system uses and also exposes to public especially programmers. Native API had a very advanced documentation in sysinternals website earlier, when it wasn't part of Microsoft and the articles were "visible". There are so many books and MSDN articles written on it. I would primarily thank, for this article, Matt Pietrek, Mark Russinovich, Sven Shreiber, Gary Nebbet, Prasad Dabak, Milind Borate... I may have missed some names.. I learned from many.. so apologies to them who know I follow them but I didn't take their name here... I also have referred and for some code. I used exploit-monday to refer the AMAZING documentation of some "undocumented" data structures and functions put by Matt Graeber. I only referred rohitab's forum to check if my code is "really fine". 

The code I have put here is definitely different from rohitab's because I use explicit linking. But before I talk about all that... firstly what is this so called "native" or "undocumented" API? Well if you think about all the available application programming interface that Windows provides as layers, then one layer (almost the top) comprises of Windows API that is offered but a huge set of DLL files like KernelBase(kernel32), User32, Gdi32, advapi32, crypt32 etc... Now if that is the first layer or the interface you use as a programmer, then the next layer is "NTDLL" layer (let us put it this way to simplify). To get a clear understanding I would ask the reader to refer to the popular Windows Internals books..  They are called "undocumented" because Microsoft never documented these functions calls and data structures officially (until now where there is some partial documentation). NTDLL.DLL is like a layer by itself and most of the API calls from the other DLLs call functions of NTDLL.DLL. This is one of the major "code path" and by code path I mean flow of code from user mode to kernel mode. NTDLL is like the last layer of user mode and then the code switches mode to kernel mode and your code (actually in form of system calls and requests to system) continue executing. The user mode Windows API is quite well documented. The kernel mode programming interface is provided by ntoskrnl, hal etc and that is documented too. Driver developers will be and should be well aware of those interfaces. Native API sits somewhere in between but isn't well documented. Microsoft uses this layer to do proprietary advancement to "chewing" user mode code and then pushing "massaged" code to kernel mode for execution. You can imagine this is an abstraction offered to the Windows API itself. You call Windows API. Windows API calls NTDLL functions (native api) and then there is mode switch to kernel and so on..
Did that simplify it better than any other documentation about native API? I don't know... if you think so please comment on this article. I can edit and make it better if needed.

Okay, so why do you need to know about "native api"? -  You can easily achieve almost all tasks with Windows API. Even higher frameworks offered to programmers like .Net, Java etc all use Windows API. Even the OS uses some of the functions offered by this "Windows API" layer. You would still use native api to know more about what happens inside. You use it because it can give more information and all that perhaps in a single function call. You use it because you skip a layer (WinAPI) and so your code saves several CPU cycles as you skip a lot of instructions that make the Windows API. Hackers and Anti-hackers have been using the undocumented interfaces for a long time. Hooks and Undocumented APIs are favorite for both hackers and anti-hackers. If you are lucky you will find an article about how Symantec anti-virus got into trouble because of using undocumented interfaces (not necessarily native API) after the introduction of Windows Vista. Microsoft warns users from using these APIs because they can change. For most tasks programmers SHOULD use Windows API.

Ok enough of theory, let us put it to practical use... what I will show in this article is a famous code that people usually look for.. "How to list processes using native api" or "NtQuerySystemInformation"?

"NtQuerySystemInformation" is a very powerful function that resides in ntdll.dll. As the name says, it queries a lot of system information. You choose the "information" that you want from an "enumeration". Matt has  re-documented (seems to be the latest) in It was indeed documented well, in year 2000 by Gary Nebbet. As said earlier, Microsoft can anytime make any changes to these structures, enumerations and functions. I haven't really done a diff but I could see a few more "information type" added to this enumeration in Matt's.

I assume the reader knows to use LoadLibrary() and do explicit linking.  I won't explain all those elementary stuff here. So here is what we need to do, to enumerate the processes currently managed by the operating system... (experts see how careful I was when I said that (lol) - processes don't run + there can be terminated-but-hanging-around processes (zombies) - refer Mark/David/Alex discussion in their internals-book as well as some forums).

You may use winternl.h to get the limited Microsoft documentation (and access) of native api/structs. I prefer to have my own list and I used Matt's documentation said earlier. winternl.h has been used and ntdll library has been implicitly linked, in the code that is shown in rohitab, that is if you want to do it that way... We will do explicit linking here...pretty much what people are mostly looking for...

#include "undoc.h" //uses - NtQuerySystemInformation.h

#define BLOB_SIZE 1024 * 1024 //Allocate a really large pool

int _tmain(int argc,PTSTR *argv)
    SYSTEM_PROCESS_INFORMATION *pspi_next,*pspi;
    NTSTATUS ns;
    pspi = (PSYSTEM_PROCESS_INFORMATION)HeapAlloc(GetProcessHeap(),0,BLOB_SIZE);


    pspi_next = pspi;
    do{ //Well, if this code is even running then there are at least 8 - 10 process in the system
        //depending upon the OS version (even WinPE!)

        pspi_next = (PSYSTEM_PROCESS_INFORMATION)(((PBYTE)pspi_next) + pspi_next->NextEntryOffset);


    return 0;

//undoc.h header [excerpted - rest you can refer exploit-monday and also give credit to the
//author Matt for his valuable contribution]

typedef NTSTATUS (NTAPI *_NtQuerySystemInformation)
    (IN SYSTEM_INFORMATION_CLASS SystemInformationClass,
    OUT PVOID SystemInformation,
    IN ULONG SystemInformationLength,

_NtQuerySystemInformation NtQuerySystemInformation = (_NtQuerySystemInformation)


We are using the undocumented NtQuerySystemInformation function to get some process information stored in the memory that is managed by the operating system. This information is SYSTEM_PROCESS_INFORMATION structure and we have one per process. We do not know how many processes... So allocate a large blob. Thankfully Heap functions do support this amount I gave. If not use VirtualAlloc (  Given one chunk of this information, it has an offset to the next chunk of the same "information set". Use that "link" to traverse all "chunks" until you reach end of blob. We don't check if it is end of blob, instead we see if "NextEntryOffset" goes 0. The rest is basics..


Thursday, March 10, 2016

How to verify a PE digital signature (Extended version)

A very good morning/afternoon/evening.. I am writing this at 3:32 AM... so I may say Good Early Morning as well...  I can stay deprived of sleep but not knowledge... :)

Well, if you are checking this topic I assume you already know about digital signatures and especially how they are used on PE (Portable Executable) images. Anyway, I got a project where I had to work on scanning some cabinet files that contain digital signature. And as of this writing, the world (matrix) is going through a significant change. SHA1 is getting deprecated and SHA2 is being implemented all over the world (corporate world).

So the best tool as of now is "signtool" that is provided with the driver development kit, SDK or with Visual studio. There are numerous sites published in the past couple of months and being published now on what is digital signature, how to have multiple signature and what not. So I was reading them and I learned a lot. Anyway I primarily started with two sources that I want the reader to read before he reads further...

Most of the information you see there are already documented. What is not documented is some stuff on how to check signature via catalog files and the sysinternals link provides code for exactly that problem. But not adequate so I added up two calls to make it better

(Look for Karthik - emm so many pseudonyms!!!)

But there are some good things that has happened. Microsoft has documented a few more functions that I am sure will be fully used for coding in the coming months and there will be so many websites perhaps MSDN itself showing up some codes that use them. I didn't see anyone using them as of this writing and of course I am using it for my project. I tried writing to a well known forum and they rejected my article for aesthetics, so here comes my first page! Ok, enough of stories and history...

So Microsoft has documented some functions that are in WinTrust.dll. Some sort of WT helper functions. I assume WinTrust helper functions.  "Signtool" (wdk 8) do not make use of these calls apparently. Only one of these functions was found. Anyway so what we can do now that we weren't able to do a year back or so is to check signature via WinVerifyTrust (venerable) and also use some information that it "stores" to get additional information.

All the functions including data structures used by them are documented. So let me just give the code here... (Again I assume you already know well about basic Crypto API usage and also I assume you went through the above forums)...

BOOL VerifyEmbeddedSignature2(HANDLE _h_verify_state)
    CMSG_SIGNER_INFO *pcmsgsi;

     _WTHelperProvDataFromStateData WTHelperProvDataFromStateData;
    WTHelperProvDataFromStateData = (_WTHelperProvDataFromStateData)

    if(WTHelperProvDataFromStateData == NULL)
          return  FALSE;
    pCPD = WTHelperProvDataFromStateData(_h_verify_state);

    if(pCPD == NULL)
            return FALSE;

    _WTHelperGetProvSignerFromChain WTHelperGetProvSignerFromChain;
    WTHelperGetProvSignerFromChain = (_WTHelperGetProvSignerFromChain)                         GetProcAddress(LoadLibrary(L"wintrust.dll"),"WTHelperGetProvSignerFromChain");

    if(WTHelperGetProvSignerFromChain == NULL)
            return FALSE;
    pCPS = WTHelperGetProvSignerFromChain(pCPD,0,FALSE,0);
    if(pCPD == NULL)
            return FALSE;
    pcmsgsi = pCPS->psSigner;
    printf("Hash Algorithm identifier (OID) - %s : Description - %s",pcmsgsi->HashAlgorithm.pszObjId,


Ok so the only parameter I pass to this function is "state data" that is actually got by calling WinVerifyTrust. How to do that is clearly shown in the MSDN code sample as well as the other sample in SysInternals forum.

Well, the only "extension" that I am providing here is get the hash algorithm. This code returns "SHA 2" OID or "SHA 1" OID. If any file is dual signed the signature at index 0 is pulled. I tested in two cases, one with a dual sign (SHA 1 + 256) and the other with just SHA 256. The GetAlgorithmName is a helper function I wrote to get a friendly name for the OID.

I am sure there is much more we can do now like connect the certificates information we get using these APIs and use "CERT functions" to get complete chaining info!!!