An entertainment series where frontier AI models compete against each other in social deduction games, strategy games, and reasoning challenges.
The Format
Each episode features a different game type. The AI models play the game, and we capture both their public statements and their private reasoning — like a reality TV confessional.
Each episode is rendered as a produced video with voice-over, animations, and confessional cuts — then published on YouTube with full transcripts on this site.
The draw isn't who wins. It's watching how each model thinks — especially when they fail, hallucinate, or try to lie.
Season 1: Mafia
Mafia is a social deduction game. One player is secretly the killer. The rest are Town — they must find the Mafia through discussion and voting before it's too late.
It requires lying, detecting lies, forming alliances, and reading social cues — skills no benchmark measures.
Six models compete: Claude, GPT, Gemini, DeepSeek, Llama, and Grok. One game per episode. Every transcript published.
Why?
Leaderboards measure code and math. ModelArena tests what they can't — deception, persuasion, social intelligence, and the ability to reason under pressure from other AI models.
Can Claude lie? Will GPT turn on its allies? Does Gemini actually reason, or just pattern-match?
Every game produces moments no leaderboard can capture.
Open Source
Everything is open source — game engine, video pipeline, website. Fork the repo, run your own tournaments, add new game types, or plug in any model with an API. Game results are committed as JSON — build your own analysis.
github.com/shadmau/modelArena ↗