I’ve created a simple but hopefully effective heap profiler for windows C/C++ applications called Heapy.

Heapy requires no modifications to the program to be profiled. With a very quick setup it can profile 32 or 64 bit windows C/C++ applications. Heapy will list the top allocation sites of your application every few seconds – helping you track down memory leaks and giving you a better insight into what parts of you program are using memory.

The readme in that zip should contain enough to get you started – there’s more information on the Github Page and in the rest of this blog post.

If you want to build Heapy yourself you just need to clone it on GitHub and build with Visual Studio 2012 (the express edition should work.)

## Example

If we compile the following test application:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 // Code for TestApplication.exe   #include <windows .h> #include <iostream>   void LeakyFunction(){ malloc(1024*1024*5); // leak 5Mb }   void NonLeakyFunction(){ auto p = malloc(1024*1024); // allocate 1Mb std::cout < < "TestApplication: Sleeping..." << std::endl; Sleep(15000); free(p); // free the Mb }   int main() { std::cout << "TestApplication: Creating some leaks..." << std::endl; for(int i = 0; i < 5; ++i){ LeakyFunction(); } NonLeakyFunction(); std::cout << "TestApplication: Exiting..." << std::endl; return 0; }

We can run Heapy with the command line:

> Heapy TestApplication.exe

Which will generate the following two reports in “Heapy_Profile.txt”

=======================================   Printing top allocation points.   < Trimmed out very small allocations from std::streams >   Alloc size 1Mb, stack trace: NonLeakyFunction e:\sourcedirectory\heapy\testapplication\main.cpp:9 (000000013FEC1D7E) main e:\sourcedirectory\heapy\testapplication\main.cpp:22 (000000013FEC1E0D) __tmainCRTStartup f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c:241 (000000013FEC67FC) BaseThreadInitThunk (00000000779A652D) RtlUserThreadStart (0000000077ADC541)   Alloc size 25Mb, stack trace: LeakyFunction e:\sourcedirectory\heapy\testapplication\main.cpp:6 (000000013FEC1D5E) main e:\sourcedirectory\heapy\testapplication\main.cpp:20 (000000013FEC1E06) __tmainCRTStartup f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c:241 (000000013FEC67FC) BaseThreadInitThunk (00000000779A652D) RtlUserThreadStart (0000000077ADC541)   Top 13 allocations: 26.005Mb Total allocations: 26.005Mb (difference between total and top 13 allocations : 0Mb)   =======================================   Printing top allocation points.   < Trimmed out very small allocations from std::streams >   Alloc size 25Mb, stack trace: LeakyFunction e:\sourcedirectory\heapy\testapplication\main.cpp:6 (000000013FEC1D5E) main e:\sourcedirectory\heapy\testapplication\main.cpp:20 (000000013FEC1E06) __tmainCRTStartup f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c:241 (000000013FEC67FC) BaseThreadInitThunk (00000000779A652D) RtlUserThreadStart (0000000077ADC541)   Top 5 allocations: 25.005Mb Total allocations: 25.005Mb (difference between total and top 5 allocations : 0Mb)

The rest of this post is focused on why and how I constructed Heapy.

My last post was about creating a tiny just in time (JIT) compiler for mathematical expressions. This new project builds on that work. JITted mathematical expressions were the first problem to solve on the way to making a toy programming language for image processing. I have called this new project Pixslam, the sourcode and a decent amount of documentation are on GitHub.

Edge detection with Pixslam

Recently I have been interested in code generation. Specifically generating native x86-64 code on the fly.

In the future I hope to present useful and novel applications of code generation – but for now I just want to demonstrate the simplest thing I think you can call just-in-time (JIT) code generation without blushing.

Skip right to the code on GitHub or continue reading for a full description.

Just a quick one (mostly to remind myself that I have a blog which needs updating!)

One of my courseworks at university was to implement ant colony optimization for the traveling salesman problem.

My solution was one of the more self contained and interesting pieces of my work at uni. So I put it up on GitHub.

I was amazed at how much faster this algorithm was compared to the earlier genetic algorithm based approaches I tried during the course. The only non vanilla part of the implementation is an approximation for the Math.pow function in java which sped up the whole program up by a decent amount.

// Approximate power function, Math.pow is quite slow and we don't need accuracy. // See: // http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/ // Important facts: // - >25 times faster // - Extreme cases can lead to error of 25% - but usually less. // - Does not harm results -- not surprising for a stochastic algorithm. public static double pow(final double a, final double b) { final int x = (int) (Double.doubleToLongBits(a) >> 32); final int y = (int) (b * (x - 1072632447) + 1072632447); return Double.longBitsToDouble(((long) y) << 32); }

It’s been a while since my last post. I would have probably had more time for extra curricular projects if I didn’t spend so much of my life waiting for C++ applications to compile! In fact I’ve created a tool to help with exactly that…

### The problem

C/C++ projects often compile painfully slowly. A large cause of this problem is “#include” statements. One included headers drags in others which drag in yet more – one combinatorial explosion later you’re left twiddling your thumbs waiting a project to compile (since hundreds of headers can take a while to open and compile).

One way to avoid headers dragging into too many other headers with them is to use forward declaration and dynamic allocation. This allows us to remove some include directives from headers. Removing an unnecessary include from a popular header car really help compile times.

An odd creature

Here it is. A modern browser will be required.

I’m not sure what it is. It looks nice, but strange.

Technical details:

I used processing to create it and then processing.js to put it on a web page. It’s driven by a system of ODEs based on a particle/spring system with oscillating spring lengths and “magnetic walls”. A common fourth-order Runge–Kutta method is used to solve the ODEs.

This is a very quick post describing how to fix an annoyance with Matlab under Linux. I thought I’d share this with the internet to hopefully save the next person who has this trouble some time.

On Linux Matlab R2011b annoyingly requires the fairly old gcc 4.3.4 to compile C/C++ “mex” functions. This is a problem for recent versions of Ubuntu 64-bit. If you attempt to use a more recent version of gcc you get errors like “libstdc++.so.6: version `GLIBCXX_3.4.11′ not found” when trying to run the mex files. This is because matlab uses it’s own libc/stdc++, and only includes them for older compilers.

The fix is to install gcc 3.4 and point matlab to it by editing mexopts.sh. Annoyingly there is no package for gcc 3.4 on Ubuntu 11.10, so you have to build it yourself. The reason for this blog post is that the build isn’t quite “textbook” and it took me a while to get it to work. Hopefully this guide might save some googleing soul the hassle, and I’ll be able to point co-workers to it when they upgrade.

The other day I was playing around in Matlab, and although I can’t remember what I set out to do I did end up making a small lossy audio compression/decompression system! It seemed like a good topic for a blog post.

#### The discrete cosine transformation

Before I show the code I’ll have to very briefly introduce the discrete cosine transform (DCT). We should be able to ignore the maths and implementation of the DCT and treat it as a magic box which comes with Matlab or octave. If your interested in the details (and they are interesting) this book is a great place to start if you want more depth than wikipedia offers.

An audio sample is a sequence real numbers $$X = \{x_1, \ldots x_N\}$$. The DCT of this audio sample is the sequence, $$DCT(X) = Y = \{y_1, \ldots, y_N \}$$ such that

$$x_n = \sum_{k=1}^n y_k w(k) cos\left( \frac{\pi(2n-1)(k-1)}{2N} \right)$$

where

$$w(k) =\cases{\frac{1}{\sqrt{N}}, & k=1 \cr \sqrt{\frac{2}{N}}, & \text{otherwise}}.$$

Don’t worry too much about that expression. We just need note that the DCT represents the original signal as a sum of cosines, and that the coefficients specify the amplitude of these cosines.

If we have the DCT coefficients we can transform them back to the original sequence with the inverse discrete cosine transform (IDCT). This could be calculated with the above expression but more efficient algorithms exist for both the DCT and IDCT (these algorithms are based on the fast Fourier transform, which is again an interesting topic that I won’t get into).

One of my recent interests has been in solving problems with various kinds of satisfiability solvers. As an introduction to this idea I want to demonstrate how to create a sudoku solver with almost no effort by reduction to a common satisfiability problem (CNF-SAT) and using an existing solver (MiniSat).

Solving sudoku by reduction to CNF-sat is hardly a new idea, I’m sure a quick google would show various approaches. If the mathematical notation below looks a little scary perhaps try reading the code further down, it really is quite simple if you can read propositional formulas.

### A quick sudoku recap

A sudoku puzzle is a 9×9 grid of cells, split into 9 non overlapping 3×3 “boxes”. Some of these will be labelled with a digit from 1 to 9, others will be blank. The aim is to label the remaining cells so that every row, column and box contains the digits 1 through 9.

Unsolved sudoku grid

Solved sudoku gird

### CNF-SAT

Perhaps the most well known satisfiability problem is propositional satisfiability: given a formula in propositional logic can we find assignments for all the variables which makes the formula true? A subset of this problem is satisfiability on formulas in conjunctive normal form (CNF-SAT). A formula is in conjunctive normal form if it is a conjunction of clauses, where a clause is a disjunction of literals (a literal is just a variable or its negation), e.g

$(a \vee \neg b) \wedge (\neg a \vee c \vee d) \wedge (\neg b \vee \wedge \neg d)$ $a \wedge (\neg a \wedge \neg b)$