Mar 142013
 

Recently I have been interested in code generation. Specifically generating native x86-64 code on the fly.

In the future I hope to present useful and novel applications of code generation – but for now I just want to demonstrate the simplest thing I think you can call just-in-time (JIT) code generation without blushing.

Skip right to the code on GitHub or continue reading for a full description.

Update: You can see more exciting JIT code generation for image processing in this follow up post.

AsmJit

The key to making native code generation “fun” and simple is using a library to carry some of the load. We want to at least be working something like assembly – not worrying about generating the actual binary code for the CPU. AsmJit is an excellent C++ library which provides a “run-time assembler” for x86-64 code generation and just enough extra functionality to make code generation a breeze. Here’s a few things AsmJit helps with:

  • It lets us step above having to generate actual binary code for the CPU. We write assembly and AsmJit outputs code.
  • Handles dealing with platform ABI – you don’t have to worry about calling conventions and the stack – AsmJit will just give you arguments you can use and handle return values.
  • Deals with allocating space for code, making sure its executable and cleaning up – the “output” is just a C function pointer you can call like anything else.
  • Takes care of the differences between 32 and 64 bit – with a little care you can target both!
  • CPU detection. Good SSE support. And logging (“dissasembly” of exact code we generated.)

Above the assembler layer AsmJit offers a “Compiler.” The main thing the compiler does is let us allocate as many variables as we want and handle register allocation for us. Using the AsmJit compiler is like writing assembler for a CPU with infinite registers, I find it to be the perfect level at which to work. For the fastest code you may need to drop down to the lower level assembler – but most experiments can be started with the compiler. Check out the examples on the AsmJit wiki for very simple demonstrations of what AsmJit does and the difference between the low level assembler and higher level compiler.

There is not a huge amount written online about AsmJit. Hopefully the example in this blog post will help people get started! I use a recent SVN revision of AsmJit – it looks like the developer has made a fair few changes to the public interface since the last release and intends to make a new release soon.

JitCalc

I thought one of the simplest, just aboutuseful, things that could be made with AsmJit is a mathematical expression evaluator. By that I mean: at run time we can specify a mathematical function – like f(x,y) = x + y*2 and have our program call it with any arguments. Many applications call for such functionality.

Let’s save parsing for another blog post and just use LISP like S-expressions instead of standard mathematical notation. Here’s a few examples:

(+ x y) equivalent to: x + y
(+ (/ y 2) (+ x 1)) equivalent to: (y / 2) + (x +1)
(- (+ y (/ x 2)) 1) equivalent to: (y + (x / 2)) + 1

I looked at this wonderful blog post as a starting point for parsing and representing S-expressions in C++. With some of the code from there we can parse an S-expression from a string into the following C++ struct:

1
2
3
4
5
6
7
8
9
10
11
// S-Expression structure.
struct Cell{
    enum Type {Symbol, Number, List};
    typedef Cell (*proc_type)(const std::vector<Cell> &);
    typedef std::vector<Cell>::const_iterator iter;
    Type type;
    std::string val;
    std::vector<Cell> list;
    Cell(Type type = Symbol) : type(type) {}
    Cell(Type type, const std::string & val) : type(type), val(val) {}
};

We’re going to have to “visit” entire S-Expressions and apply certain operations when we encounter a number, symbol or function call. Here’s a helper class for that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// Generic templated visitor base class.
template <typename EvalReturn> class Visitor{
public:
    typedef std::map<std::string, std::function<EvalReturn (const std::vector<EvalReturn> &)>> 
        FunctionMap;
 
    typedef std::function<EvalReturn (const std::string &symbol)> SymbolHandler;
    typedef std::function<EvalReturn (const std::string &number)> NumberHandler;
 
protected:
    FunctionMap functionMap;
    NumberHandler numberHandler;
    SymbolHandler symbolHandler;
 
public:
    Visitor(){
    }
 
    EvalReturn eval(const Cell &c){
        switch(c.type){
            case Cell::Number:{
                return numberHandler(c.val.c_str());
            }case Cell::List:{
                std::vector<EvalReturn> evalArgs(c.list.size()-1);
 
                // eval each argument
                std::transform(c.list.begin()+1, c.list.end(), evalArgs.begin(), 
                    [=](const Cell &c) -> EvalReturn{
                    return this->eval(c);
                });
 
                if(functionMap.find(c.list[0].val) == functionMap.end())
                    throw std::runtime_error("Could not handle procedure: " + c.list[0].val);
 
                // call function specified by symbol map with evaled arguments
                return functionMap.at(c.list[0].val)(evalArgs);
          }case Cell::Symbol:{
              if(symbolHandler)
                  return symbolHandler(c.val);
              else
                  std::runtime_error("Cannot handle symbol: " + c.val);
          }
        }
      std::runtime_error("Should never get here.");
      return EvalReturn(); // quiet compiler warning.
    }
};

From there we can very quickly get to an interpreted calculator. We will do the JIT stuff after we “master” the interpreted approach!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Interpreted calculator without variables (no symbolHandler!)
class Calculator : public Visitor<double>{
public:
    Calculator(){
        // standard functions
        functionMap["+"] = [](const std::vector<double> &d){return d[0] + d[1];};
        functionMap["-"] = [](const std::vector<double> &d){return d[0] - d[1];};
        functionMap["/"] = [](const std::vector<double> &d){return d[0] / d[1];};
        functionMap["*"] = [](const std::vector<double> &d){return d[0] * d[1];};
 
        numberHandler = [](const std::string &number){
            return std::atof(number.c_str());
        };
    }
};

Calling the eval method of the above class with an expression (with just numbers, no symbols) will return the evaluation of the that expression. In other words we have a simple calculator.

Calculator().eval("(+ 1 (2 *3)"); // will return 7!

From there it’s very simple to extend this to a function object which will accept a LISP like representation of a mathematical function with an arbitrary number of parameters.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// Extend calculator above into function evaluator.
class CalculatorFunction : public Calculator{
private:
    std::map<std::string, int> argNameToIndex;
    Cell cell;
public:
    CalculatorFunction(const std::vector<std::string> &names, const Cell &c) : cell(c){
        for(size_t i = 0; i < names.size(); ++i)
            argNameToIndex[names[i]] = i;
    }
 
    double operator()(const std::vector<double> &args){
        symbolHandler = [&](const std::string &name) -> double{
            return args[this->argNameToIndex[name]];	
        };
        return eval(cell);
    }
};

We can now do the following:

1
2
3
4
5
6
7
    std::vector<double> argNames = {"x", "y"};
    CalculatorFunction f(argNames, "(+ x y")); // create the function f(x,y) = x + y
 
    // Call our function with x = 2 and y = 3
    std::vector<double> args = {2.0 3.0};
    double z = f(args); // z = f(x,y) = x + y = 2 + 3 = 5!
    std::cout << z; // prints 5

Now we have seen how to do things in the interpreted fashion lets create a JIT compiled function evaluator! We’ll use the AsmJit compiler to make our lives easy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
// JIT version of CalculatorFunction class.
// Expressions return AsmJit SSE "registers"/variables.
class CodeGenCalculatorFunction : public Visitor<AsmJit::XmmVar>{
private:
    AsmJit::X86Compiler compiler;
    std::map<std::string, int> argNameToIndex;
 
    typedef double (*FuncPtrType)(const double * args);
    FuncPtrType generatedFunction;
public:
    CodeGenCalculatorFunction(const std::vector<std::string> &names, const Cell &cell){
        using namespace AsmJit;
 
        // Map operators to assembly instructions
        functionMap["+"] = [&](const std::vector<XmmVar> &args) -> XmmVar{
            compiler.addsd(args[0], args[1]);
            return args[0];
        };
 
        functionMap["-"] = [&](const std::vector<XmmVar> &args) -> XmmVar{
            compiler.subsd(args[0], args[1]);
            return args[0];
        };
 
        functionMap["*"] = [&](const std::vector<XmmVar> &args) -> XmmVar{
            compiler.mulsd(args[0], args[1]);
            return args[0];
        };
 
        functionMap["/"] = [&](const std::vector<XmmVar> &args) -> XmmVar{
            compiler.divsd(args[0], args[1]);
            return args[0];
        };
 
        // Convert numbers into AsmJit vars.
        numberHandler = [&](const std::string &number) -> XmmVar{
            double x = std::atof(number.c_str());
            XmmVar xVar(compiler.newXmmVar());
            SetXmmVar(compiler, xVar, x);
            return xVar;
        };
 
        for(size_t i = 0; i < names.size(); ++i)
            argNameToIndex[names[i]] = i;
 
        symbolHandler = [&](const std::string name) -> XmmVar{
            // Lookup name in args and return AsmJit variable
            // with the arg loaded in.
            // TODO: this could be more efficient - could
            // create one list of XmmVars and use that.
            GpVar ptr(compiler.getGpArg(0));
            XmmVar v(compiler.newXmmVar());
            int offset = argNameToIndex.at(name)*sizeof(double);
            compiler.movsd(v, Mem(ptr, offset));
            return v;
        };
 
        generatedFunction = generate(cell);
    }
 
    FuncPtrType generate(const Cell &c){
        compiler.newFunc(AsmJit::kX86FuncConvDefault, 
                AsmJit::FuncBuilder1<double, const double *>());
        AsmJit::XmmVar retVar = eval(c);
        compiler.ret(retVar);
        compiler.endFunc();
        return reinterpret_cast<FuncPtrType>(compiler.make());
 
    }
 
    double operator()(const std::vector<double> &args) const {
        return generatedFunction(&args[0]); 
    }
 
    ~CodeGenCalculatorFunction(){
        AsmJit::MemoryManager::getGlobal()->free((void*)generatedFunction);
    }
 
private:
    void SetXmmVar(AsmJit::X86Compiler &c, AsmJit::XmmVar &v, double d){
        using namespace AsmJit;
        // No immediates for SSE regs/doubles. So put into a general purpose reg
        // and then move into SSE - we could do better than this.
        GpVar gpreg(c.newGpVar());
        uint64_t *i = reinterpret_cast<uint64_t*>(&d);
        c.mov(gpreg, i[0]); 
        c.movq(v, gpreg); 
        c.unuse(gpreg);
    }
};

That wasn’t so hard. The key thing to notice is that when we “evaulate” now we are simply visiting the expression and returning AsmJit variables. As we evaluate we push the relevant instructions to perform the arithmetic operations on our operands (which are AsmJit variables) into the AsmJit compiler. SSE registers and instructions were used (all that XmmVar stuff) wince it turns out to be easier. Observe how AsmJit freed us from having to worry about register allocation. See also how we call the generated code just like any C++ function pointer: generatedFunction(&args[0]);.

We can now replace any instance of the CalculatorFunction class with an instance of CodeGenCalculatorFunction. Construction will be slower (it’s generating code!) but evaluation should be much faster. We have turned our S-expression into native code!

How does it perform?

Full code can be found on GitHub along with some documentation.

The project contains a simple command-line interface for testing and bench-marking our CalculatorFunction classes. The JIT code runs about 100 times faster than the interpreted equivalents – not bad for less than 100 extra lines!

$ ./jitcalc -benchmark "((x y) (+ (* (+ x 20) y) (/ x (+ y 1))))" 15.5 20 
Interpreted output: 710.738
Code gen output: 710.738

Benchmarking...
Duration for 10000000 repeated evaluations:

 - Interpreted: 5732ms
 - JIT: 52ms

Well that’s all for now. I intend to present slightly more useful examples of code generation in the future!

 Posted by at 9:38 pm

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>