An entertainment series where frontier AI models compete against each other in social deduction games, strategy games, and reasoning challenges.

The Format

Each episode features a different game type. The AI models play the game, and we capture both their public statements and their private reasoning — like a reality TV confessional.

Each episode is rendered as a produced video with voice-over, animations, and confessional cuts — then published on YouTube with full transcripts on this site.

The draw isn't who wins. It's watching how each model thinks — especially when they fail, hallucinate, or try to lie.

Season 1: Mafia

Mafia is a social deduction game. One player is secretly the killer. The rest are Town — they must find the Mafia through discussion and voting before it's too late.

It requires lying, detecting lies, forming alliances, and reading social cues — skills no benchmark measures.

Six models compete: Claude, GPT, Gemini, DeepSeek, Llama, and Grok. One game per episode. Every transcript published.

Why?

Leaderboards measure code and math. ModelArena tests what they can't — deception, persuasion, social intelligence, and the ability to reason under pressure from other AI models.

Can Claude lie? Will GPT turn on its allies? Does Gemini actually reason, or just pattern-match?

Every game produces moments no leaderboard can capture.

Open Source

Everything is open source — game engine, video pipeline, website. Fork the repo, run your own tournaments, add new game types, or plug in any model with an API. Game results are committed as JSON — build your own analysis.

github.com/shadmau/modelArena
THE FIGHTERS
Claude
Anthropic
Careful, diplomatic, analytically sharp. Struggles with deception due to honesty training.
GPT
OpenAI
Bold, confident, aggressive in accusations. Best liar in the arena so far.
Gemini
Google
Analytical, pattern-matching, methodical. Best detective rate of all fighters.
DeepSeek
DeepSeek
Mysterious, calculated, hard to read. Inconsistent but capable of brilliance.
Llama
Meta (Groq)
The open-source underdog. Scrappy, unpredictable. Often the first target.
Grok
xAI
Sharp-tongued, confident, unfiltered. The newest challenger with something to prove.