parse → bytecode → native.
a toy language with a bytecode interpreter and a one-pass x86-64 jit. fib(35) on the jit is about 12x faster than my own interpreter. the jit is roughly 800 lines, emits instructions straight into mmapped memory, no llvm, no cranelift. every optimization you enable, you wrote.
source bytecode emitted x86-64 fn fib(n) { 0000 load_arg 0 mov rdi, [rbp-8] if n < 2 { n } 0002 iconst 2 mov rax, 2 else { 0004 lt cmp rdi, rax fib(n-1) 0005 jz 12 jge .L1 + fib(n-2) 0007 load_arg 0 mov rax, rdi } 0009 ret ret } 000a jmp 23 .L1: 000c load_arg 0 ... 000e iconst 1 sub rdi, 1 0010 sub call fib 0011 call fib ...
open the repl. type a function. watch it compile.
$ jitvm jitvm 0.3.1 · bytecode vm + x86-64 jit · :help for commands jv> fn fib(n) { if n < 2 { n } else { fib(n-1) + fib(n-2) } } defined fib/1 jv> :disasm fib bytecode (14 ops, 28 bytes): 0000 load_arg 0 0002 iconst 2 0004 lt 0005 jz 000c 0007 load_arg 0 0009 ret 000a jmp 0017 000c load_arg 0 ... jv> :jit fib compiled fib → 0x7f4c8a001000 (312 bytes) jv> :time fib(35) => 9227465 interp: 2.23s · jit: 181ms · (12.3x on my box)
the moving parts.
pratt parser
operator precedence by binding power, left and right bp per token, recursive descent for the rest. around 250 lines, handles every expression shape the language has.
stack-based bytecode
48-ish opcodes, one-byte opcode plus variable operand. inline caching on method dispatch, so the second call to the same shape hits a direct offset.
one-pass template jit
walks the bytecode once and emits x86-64 straight into mmap(PROT_EXEC). no ir, no register allocator. templates per opcode, stitched with short jumps. i kept patching the wrong offset for about two days before this worked.
tagged nan-boxing
all values fit in one 64-bit register. ieee 754 nans have 52 spare bits, so pointers, ints, bools and nil all ride inside doubles. no heap word for small values.
mark-sweep gc
tri-color mark-sweep with a write barrier, tuned for generational-ish workloads. around 600 allocs/ms sustained on an m1, tracked with a rolling allocation budget.
a decent repl
line editing, history, multiline input that knows when a block is unclosed, in-terminal syntax highlighting. :disasm, :jit, :time built in.
why build a whole language when you could just write python?
every layer hides assumptions. a python function call looks cheap until you watch the interpreter build a frame object, resolve globals through a dict, and trampoline through eval. writing a jit teaches you what the cpu actually wants. registers in known places, branches it can predict, calls whose targets resolve before dispatch. once you emit a call yourself, the cost of an indirect one stops being a story someone told you.
writing a gc teaches you why rc plus cycle-detection isn't free, why a write barrier is a tax you pay on every mutation, why generational collectors exist at all. you don't need any of this to ship anything. pick cranelift, skip the whole exercise. but the pleasure of jitvm is having the full stack in your hands, source text on one side and bytes executing on a cpu on the other, with nothing you didn't write in between.
not public yet.
source drops on github soon, i'm still polishing the repo. rust 1.75+, x86-64 jit backend when it ships. ping bennett@frkhd.com for early access.