open source · rust · x86-64 · mit

parse → bytecode → native.

a toy language with a bytecode interpreter and a one-pass x86-64 jit. fib(35) on the jit is about 12x faster than my own interpreter. the jit is roughly 800 lines, emits instructions straight into mmapped memory, no llvm, no cranelift. every optimization you enable, you wrote.

source                  bytecode             emitted x86-64
fn fib(n) {             0000 load_arg 0      mov    rdi, [rbp-8]
  if n < 2 { n }        0002 iconst   2      mov    rax, 2
  else {                0004 lt                 cmp    rdi, rax
    fib(n-1)            0005 jz       12        jge    .L1
    + fib(n-2)          0007 load_arg 0         mov    rax, rdi
  }                     0009 ret                ret
}                       000a jmp      23     .L1:
                        000c load_arg 0         ...
                        000e iconst   1         sub    rdi, 1
                        0010 sub                call   fib
                        0011 call     fib       ...
three views of the same function
a session

open the repl. type a function. watch it compile.

$ jitvm
jitvm 0.3.1  ·  bytecode vm + x86-64 jit  ·  :help for commands

jv> fn fib(n) { if n < 2 { n } else { fib(n-1) + fib(n-2) } }
  defined fib/1

jv> :disasm fib
  bytecode (14 ops, 28 bytes):
  0000 load_arg 0
  0002 iconst   2
  0004 lt
  0005 jz       000c
  0007 load_arg 0
  0009 ret
  000a jmp      0017
  000c load_arg 0   ...

jv> :jit fib
  compiled fib → 0x7f4c8a001000 (312 bytes)

jv> :time fib(35)
=> 9227465
  interp: 2.23s   ·   jit: 181ms   ·   (12.3x on my box)
under the hood

the moving parts.

pratt parser

operator precedence by binding power, left and right bp per token, recursive descent for the rest. around 250 lines, handles every expression shape the language has.

stack-based bytecode

48-ish opcodes, one-byte opcode plus variable operand. inline caching on method dispatch, so the second call to the same shape hits a direct offset.

one-pass template jit

walks the bytecode once and emits x86-64 straight into mmap(PROT_EXEC). no ir, no register allocator. templates per opcode, stitched with short jumps. i kept patching the wrong offset for about two days before this worked.

tagged nan-boxing

all values fit in one 64-bit register. ieee 754 nans have 52 spare bits, so pointers, ints, bools and nil all ride inside doubles. no heap word for small values.

mark-sweep gc

tri-color mark-sweep with a write barrier, tuned for generational-ish workloads. around 600 allocs/ms sustained on an m1, tracked with a rolling allocation budget.

a decent repl

line editing, history, multiline input that knows when a block is unclosed, in-terminal syntax highlighting. :disasm, :jit, :time built in.

why bother

why build a whole language when you could just write python?

every layer hides assumptions. a python function call looks cheap until you watch the interpreter build a frame object, resolve globals through a dict, and trampoline through eval. writing a jit teaches you what the cpu actually wants. registers in known places, branches it can predict, calls whose targets resolve before dispatch. once you emit a call yourself, the cost of an indirect one stops being a story someone told you.

writing a gc teaches you why rc plus cycle-detection isn't free, why a write barrier is a tax you pay on every mutation, why generational collectors exist at all. you don't need any of this to ship anything. pick cranelift, skip the whole exercise. but the pleasure of jitvm is having the full stack in your hands, source text on one side and bytes executing on a cpu on the other, with nothing you didn't write in between.

install

not public yet.

source drops on github soon, i'm still polishing the repo. rust 1.75+, x86-64 jit backend when it ships. ping bennett@frkhd.com for early access.