Update (2017-04-02):
I have continued development of Heapy on GitHub.
The command line usage has been tweaked. Please see the github for more up to date usage docs.
This post remains the best/only documentation of Heapys inner workings which remain the same – but the code is now slightly out of date.

I’ve created a simple but hopefully effective heap profiler for windows C/C++ applications called Heapy.

Heapy requires no modifications to the program to be profiled. With a very quick setup it can profile 32 or 64 bit windows C/C++ applications. Heapy will list the top allocation sites of your application every few seconds – helping you track down memory leaks and giving you a better insight into what parts of you program are using memory.

The readme in that zip should contain enough to get you started – there’s more information on the Github Page and in the rest of this blog post.

If you want to build Heapy yourself you just need to clone it on GitHub and build with Visual Studio 2012 (the express edition should work.)

## Example

If we compile the following test application:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 // Code for TestApplication.exe   #include <windows.h> #include <iostream>   void LeakyFunction(){ malloc(1024*1024*5); // leak 5Mb }   void NonLeakyFunction(){ auto p = malloc(1024*1024); // allocate 1Mb std::cout << "TestApplication: Sleeping..." << std::endl; Sleep(15000); free(p); // free the Mb }   int main() { std::cout << "TestApplication: Creating some leaks..." << std::endl; for(int i = 0; i < 5; ++i){ LeakyFunction(); } NonLeakyFunction(); std::cout << "TestApplication: Exiting..." << std::endl; return 0; }

We can run Heapy with the command line:

> Heapy TestApplication.exe

Which will generate the following two reports in “Heapy_Profile.txt”

=======================================   Printing top allocation points.   < Trimmed out very small allocations from std::streams >   Alloc size 1Mb, stack trace: NonLeakyFunction e:\sourcedirectory\heapy\testapplication\main.cpp:9 (000000013FEC1D7E) main e:\sourcedirectory\heapy\testapplication\main.cpp:22 (000000013FEC1E0D) __tmainCRTStartup f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c:241 (000000013FEC67FC) BaseThreadInitThunk (00000000779A652D) RtlUserThreadStart (0000000077ADC541)   Alloc size 25Mb, stack trace: LeakyFunction e:\sourcedirectory\heapy\testapplication\main.cpp:6 (000000013FEC1D5E) main e:\sourcedirectory\heapy\testapplication\main.cpp:20 (000000013FEC1E06) __tmainCRTStartup f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c:241 (000000013FEC67FC) BaseThreadInitThunk (00000000779A652D) RtlUserThreadStart (0000000077ADC541)   Top 13 allocations: 26.005Mb Total allocations: 26.005Mb (difference between total and top 13 allocations : 0Mb)   =======================================   Printing top allocation points.   < Trimmed out very small allocations from std::streams >   Alloc size 25Mb, stack trace: LeakyFunction e:\sourcedirectory\heapy\testapplication\main.cpp:6 (000000013FEC1D5E) main e:\sourcedirectory\heapy\testapplication\main.cpp:20 (000000013FEC1E06) __tmainCRTStartup f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c:241 (000000013FEC67FC) BaseThreadInitThunk (00000000779A652D) RtlUserThreadStart (0000000077ADC541)   Top 5 allocations: 25.005Mb Total allocations: 25.005Mb (difference between total and top 5 allocations : 0Mb)

The rest of this post is focused on why and how I constructed Heapy.

## Why Profile the Heap?

Occasionally when developing a piece of software one has a desire to know what parts of a program are using up memory. Sometimes there’s a tricky resource leak or a need to understand which areas of code legitimately (but perhaps unpredictably) allocate a lot of memory. In Java we have the the wonderful VisualVM which can inspect a dump of the entire heap of an application and do memory profiling as an application runs – I expect similar tools exist for other interpreted or JITted languages. The situation is not as nice for C/C++: you simply can’t walk though the heap and profiling tools are limited. On Linux we can do pretty nice memory profiling with Gperftools or Valgrinds Massif. There didn’t seem to be a free/easy to use equivalent to Gperftools or Massif for windows.

I knew that creating a heap profiler for Windows wouldn’t be too tricky so I decided to give it a go myself! Due to it’s small size I also think Heapy serves as a decent introduction to DLL injection and function hooking so I’ve used the rest of this blog post to describe it in some detail.

## The Plan

The first decision I made was that the application to be profiled should not have to be modified in order to be profiled. This meant that the only way to go about this would be to inject the profiling code into the application.

After a fair amount of research I settled on DLL (Dynamic-link library) injection and function hooking as the best way to pull this off. DLL injection involes using an “injector” application to “inject” a thread running code from a DLL into a process. Once the DLL code is running it can do anything – it turns out that it’s possible to “hook” functions in a program so that they will call code from our DLL instead of (or in addition to) the original function.

I’ll call our injector application “Heapy.exe” and our injected DLL “HeapyInject.dll”. Here’s a step by step description of how Heapy works:

1. Launch Heapy.exe and specify the target executable to profile (and working directory)
2. Heapy.exe spawns the a process running the target executable in a suspended state.
3. Heapy.exe injects a thread into the target application which runs code from HeapyInject.dll
4. The remote thread hooks calls to malloc and free from every module it can find. The hooks delegate to the original malloc/free functions but record a stack trace and size of every pointer allocate.
5. The remote thread spawns an additional thread which emits a heap profiling report every 10 seconds to a file.
6. The original remote thread terminates, which signals to Heapy.exe that it should un-suspend the target applications main thread.
7. The target application executes normally with heap profile reports being written to a file every 10 seconds.
8. As the targit application exits, after all normal code has been executed, we write a final heap report to a file. (This can be an aide to hunting for memory leaks if the target application even attempts to shutdown cleanly.)
9. (Heapy.exe detects the target application is shutdown and terminates itself.)

I expect fairly curious people would not be fully satisfied with the above description.

• The DLL injection stage and the “hooking” might sound a little magical. In the next section will explain the DLL injection in a more detail.
• For the function hooking I copped out a little: I used the fantastic MinHook library to do the heavy lifting/magic part. I wont do MinHook the injustice of trying to explain that myself right now. Still there are some interesting details to elaborate on: finding mallocs/frees in every module (statically linked exes, linked in runtime dlls, etc.) and snippets showing how to use MinHook.
• What to do in our hooked functions in order to “profile” and report memory allocations. The final section expand on this a little.

## DLL Injection Details

I was surprised at how easy and “well supported” DLL injection is in Windows. The key things the Win32 API lets us do is create a thread in different process (using CreateRemoteThread) and allocate and set memory in the virtual address space of a remote process (using VirtualAllocEx and WriteProcessMemory).

To call CreateRemoteThread we have to supply an address of a function which takes a single pointer parameter and returns a DWORD (a.k.a THREAD_START_ROUTINE). The magic is that this THREAD_START_ROUTINE is compatible (enough) the the type of the function Win32 function LoadLibrary! Piecing all this together we can create a thread running in the target process running our DLLs DllMain. Here’s how:

1. Allocate some memory in the target process using VirtualAllocEx.
2. Set that memory to be a string containing the path to the DLL we want to inject using WriteProcessMemory.
3. Call CreateRemoteThread on the target process with the address of LoadLibrary as the thread start routine and our path string as the parameter.

Take a look at Heapy.cpp for the full process spawing and DLL injection code. There is also a great deal of information about DLL injection elsewhere online.

## Function Hooking Details

Function hooking is replacing a function, at runtime, with a different function. To be really useful a function hooking technique needs to provide a way to call the old function. In Heapy the injected thread needs to hook the malloc and free functions in order to profile them. I should say now that hooking malloc and free catches calls to new and delete (at least in all the target applications compiled with visual studio that I tried.)

I let MinHook do the function hooking heavy lifting. EasyHook also does hooking and injection – but it’s Hooking for C/C++ functions didn’t seem as good as MinHooks (I think the core focus of EasyHook is hooking C# or CLR applications which I’m not yet interested in).

#### Lot’s of mallocs

Even with MinHook doing the heavy lifting there is still a little work to be done. I wanted to be able to target any C/C++ application created with pretty much any version of visual studio. This means that the malloc and free functions inside my MinHook DLL would often not be the same malloc and free that are actually being used in the target application! Aside: This happens to be one of the main reasons why we can’t blindly use a dll from one version of visual studio in an application written in another – we have to make sure that a malloc made in one dll is never freed by one in another if you want to mix and match compiler versions.

The problem is not insurmountable. We can use the DbgHelp library to enumeate all modules, find malloc and frees in those modules and hook them with our profiled malloc and free functions. The gory details are in HeapyInject.cpp.

## Profiling and Reports

Once we’ve hooked allocation functions we need to figure out what information to collect and how to report it. The approach that the Gpertools and Massif take is to group all allocations by the call stack to that allocation. This makes a lot of sense: it’s a nice way to show “where” in your program was allocating. If we ignored the full call stack we would probably get useless information such as “90% of your allocations are in some standard container allocation function”. With grouping by stack traces we can get more useful information such as: “this chain of function calls allocated hundreds of megabytes in the form of some standard container”.

With this goal in mind here’s what happens when we hit a hooked malloc function in Heapy:

1. Record a stack trace with CaptureStackBackTrace. This function is gives us a hash for the stack trace which we’ll use as a stack ID.
2. Call the original malloc function.
3. Update (or create) this stack traces allocation map for this allocation. That is a list of allocations, that is pointers and size pairs, made by this stack trace.
4. Update the map of pointers with this stack trace ID so we can find the stack trace that allocated this pointer if/when “free” is called.

When we hit a hooked free call:

1. Remove the allocation from this stack traces allocation list.
2. Remove this pointer from the map of pointers.

With the data maintained above we are capable of getting a list of active (that is not freed) allocations at any time to generate reports. With careful use of hash-maps (std::unordered_map) our profiling functions are not too costly. Even with locks for thread safety the cost of maintaining this information is tiny compared to the cost of taking lots of capturing the stack traces.

For the reports I went for something very simple: just printing the top 25 allocations points and the amount allocated every few seconds and once at application exit. I used the dbghelp library again to print nice symbols for the stack trace (as long as a .pdb file can be found).

This simple reporting is enough for 90% of use cases. We can see what parts of our allocation allocate the largest amounts of memory. It lets us catch leaks on exit and leaks at runtime if they grow large. Extending the reporting would be very easy to help in particular cases. Ideally one day I would like to add a full featured user interface, but for now this simple reporting has proved useful enough.

## Wrapping Up

Well that was a lot of writing about a few hundred lines of code. Hopefully someone will find all of these details interesting! Even if that does not happen I already found Heapy to be a useful tool – perhaps other people will too.

### One Response to “Heapy: A Simple Heap Profiler for Windows C/C++ Applications”

1. Heapy is a great thing!

I have tried it on a relatively big project and it just works!
The most interesting thing about Heapy is that its source code is very small. Everyone can download its source and tune it to his needs in little time. Also, I think that I can load the DLL to my project manually in order to provide some custom statistics (e.g. how many memory allocated for each 3D solid in the model), which would be great even though MSVC2015 has a memory profiler.

I wonder if you (or any other person) maintains the project on GitHub.
Right now I see at least two issues with the code:
1. Stupid bug: command line argument in CreateProcess is not null-terminated.
2. Performance issue: using std::unordered_map means wasting a lot of time and TONS of memory in a heavy application.
The first one is already fixed in both two forks.