The story

Karpathy’s LLM Council, explained — and what it became

In late 2025, Andrej Karpathy published a weekend project: a small web app where several frontier models answer the same question, anonymously grade each other, and a chairman model writes the final answer. He called it llm-council. The idea outgrew the repo. Here is the honest story — what he built, what it proved, and how it became a product category that now includes a $200/month Perplexity feature.

What Karpathy built

The repo (github.com/karpathy/llm-council, 21k+ stars) is deliberately simple. Stage 1: your question goes to several models — GPT, Claude, Gemini, Grok — each answering independently. Stage 2: each model reads the others’ anonymized answers and ranks them, without knowing which answer is whose. Stage 3: a chairman model reads everything and writes the final response.

Karpathy was explicit about its status: a Saturday hack, provided as-is, for other people’s inspiration — he says plainly that he will not support it. There is no official hosted version and he has endorsed none.

Why a weekend hack resonated

The insight is that models are better at judging each other than at judging themselves. Anonymous cross-review turns a set of confident, conflicting answers into a ranked comparison — and the disagreement itself becomes information. Karpathy noted models would often praise a rival’s answer above their own; that honesty under anonymity is the whole trick.

This is not a new idea in disguise. Deliberation among independent judges has a research lineage: peer consensus among independent evaluators was showing measurable gains in peer-reviewed work as far back as 2013, a decade before the AI field productized it. The repo made the old idea runnable in an afternoon.

What the repo asks of you

It is a developer tool by design. You clone it, bring an OpenRouter API key, pay per token for every model call, and run the app locally. Output is a chat answer in your browser — no exports, no history across devices, no mobile, no support.

For developers who want to see the mechanism, that is perfect. For everyone else it is the catch.

  • Runs locally; you manage keys and per-token costs
  • Fixed simple flow; changing the council means editing code
  • No documents, no mobile, no support — by the author’s own choice

From hack to category

Since the repo, the council pattern has spread in three directions. Hosted products (LLM Council at llmcouncil.ai runs the full method — independent answers, anonymous peer review, synthesis with dissent kept visible — from a free daily council to larger paid councils with Word, PDF, PowerPoint and Excel exports, on any device). Developer ecosystems (Claude council skills and MCP servers that convene models inside coding tools). And in February 2026, Perplexity shipped Model Council to its Max plan: three frontier models plus a synthesizer, web-only — the clearest signal yet that multi-model verification is becoming table stakes for serious AI work.

One thing the newer implementations change: in the simplest designs, one synthesizer model still acts as the judge. Our peer-review data — 73,580 paired judgments — shows models rank their own answers measurably higher than peers rank the same answers, so who judges, and whether self-preference is corrected, is not a detail. It is the difference between a council and a chorus.

Try the method

Developers: clone the repo and read the three stages — it is a genuinely elegant 500 lines. Everyone else: llmcouncil.ai runs a real council free, once a day, no code and no API keys, on whatever device you are holding.

LLM Council home · Pricing · How it works