minilang
A Kaleidoscope-lineage C++17 compiler, extended with three hand-written LLVM optimization passes and a green CI matrix.
The problem
The LLVM Kaleidoscope tutorial teaches you to lex, parse and codegen a toy language to IR, then hand it to LLVM's stock JIT — but it stops before the part that actually demonstrates compiler-backend competence: writing your own optimization passes on the modern pass manager. minilang takes the tutorial's front end as a known-good baseline and uses it as scaffolding to build and measure original middle-end work.
The approach
minilang lowers a .ml source file (single f64 type, functions, if/else, loops) through a hand-written lexer, a recursive-descent parser with operator-precedence climbing, an AST, and LLVM IRBuilder codegen, then runs it on an ORC v2 JIT or emits text IR via -emit-llvm. The parsing and basic codegen follow the Kaleidoscope chapters and the JIT header is vendored from llvm-project; the original contribution is three passes on the new pass manager (PassInfoMixin): constant folding, dead-code elimination, and loop-invariant code motion.
Key decisions
The deliberate tradeoff was to keep everything f64 and skip a real type-checker — the README lists 'a real type-checker' as an explicit non-goal, with type errors surfacing as verifyFunction failures rather than diagnostics. That single-type choice removes semantic-analysis complexity entirely and lets the project spend its budget on the three optimization passes — the actual learning goal — instead of a type system. Scope is honestly bounded elsewhere too: JIT or text-IR only, no AOT/object output, no integer/string types, no GC, modules or generics.
Outcome
The compiler builds cleanly against LLVM 18.1.8 (CMake + Ninja) into a static binary, JIT-executes correctly (fib(10)=55, square(7)=49, end-to-end 3/3 matching their .out files), and the three custom passes produce measured IR reductions that reproduce BENCHMARKS.md exactly: 01_arith 18→14, 02_fib 37→31, 03_loop 44→30, 04_dead 21→14, 05_invariant 49→30 (−39%). A GitHub Actions matrix (Linux + macOS × Debug + Release) has five consecutive green runs.
The front end is the Kaleidoscope baseline; the three IR passes are the original contribution. IR shrinks up to −39%. CI: Linux + macOS × Debug + Release, green.
The real IR, before and after my three optimization passes.
Pick an example, then flip the optimizer on. Every line below is verbatim output from the minilang binary (LLVM 18, -emit-llvm) — nothing is hand-edited.
def square(x) x * x; square(7); // expected: 49
mem2reg promotes the stack slot for x to an SSA value, so the store and both reloads vanish — square collapses to a single multiply on the argument.
; ModuleID = 'minilang.module'source_filename = "minilang.module"define double @square(double %x) {entry:%x1 = alloca double, align 8store double %x, ptr %x1, align 8%x2 = load double, ptr %x1, align 8%x3 = load double, ptr %x1, align 8%multmp = fmul double %x2, %x3ret double %multmp}define double @__anon_expr_0() {entry:%calltmp = call double @square(double 7.000000e+00)ret double %calltmp}
Pipeline: mem2reg + LoopSimplify (LLVM standard), then my three passes — ConstFold, DCE, LICM. IR line counts are wc -l of the emitted module, matching the repo's BENCHMARKS.md. Source