one prompt. two models. one diff.
see which llm does your task better. tokens, latency, cost, all lined up. run both, eyeball it, pick one. lol that's it.
prompt prompt ↓ ↓ ┌─────────────────┐ │ ┌─────────────────┐ │ openai/gpt-4o │ │ │ anthropic/claude │ │ │ │ │ │ │ response ≈≈≈≈≈≈ │ │ │ response ≈≈≈≈≈≈ │ │ ≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈ │ │ │ ≈≈≈≈≈≈≈≈≈≈ │ │ │ │ │ │ │ 84t · 2.1s │ │ │ 67t · 1.4s │ │ $0.00021 │ │ │ $0.00018 │ └─────────────────┘ │ └─────────────────┘ ▲ same prompt
what it actually prints
real output from llmdiff "explain tcp vs udp in 3 sentences". two columns, aligned, metrics at the bottom.
$ llmdiff "explain tcp vs udp in 3 sentences" openai/gpt-4o-mini anthropic/claude-3-5-haiku ───────────────────────────────── ───────────────────────────────── TCP is connection-oriented. TCP establishes a reliable, Guarantees ordered, reliable ordered, connection-based link delivery. UDP is connectionless, between hosts. UDP is connection- faster, no guarantees. less and best-effort — fire and forget. tokens in: 14 tokens in: 14 tokens out: 84 tokens out: 67 latency: 2.1s latency: 1.4s cost (est): $0.00021 cost (est): $0.00018
openai + anthropic out of the box
no config gymnastics. drop in your keys, pick a pair of models, run. default is gpt-4o-mini vs claude-3-5-haiku.
side-by-side output with real metrics
two columns, same width, aligned. tokens in, tokens out, wall-clock latency. no scrolling back and forth between runs.
cost + concurrency, by default
both providers called in parallel, wall-clock is max(a, b) not a + b. cost estimate comes from a small price table in the source, so you stop blowing $30 on evals without noticing. prices go stale sometimes, a PR fixes that.
no sdks, --json if you want it
stdlib http.client only. no openai package, no anthropic package, no httpx. two small provider adapters you can read in a coffee break. --json pipes straight into jq, a spreadsheet, whatever eval harness you have.
yeah, you could curl both yourself. two requests, two jq incantations, a stopwatch, a calculator for the cost. i wrote it so i'd stop doing that.
install
not public yet
python 3.9+, macos and linux when it ships. repo isn't up yet. email bennett@frkhd.com if you want early access.