Back to list

Forecaster Arena - A new and uncontaminated LLM benchmark based on prediction markets

Check it out

GitHub

Forecaster Arena is a live, open benchmark where top large language models trade real Polymarket odds. Reality, not static test sets, determines which model forecasts best through transparent profit and loss and Brier scoring.

How it runs

The platform runs seven frontier LLMs head to head in live Polymarket markets.
Portfolios are marked every 10 minutes with full transparency using both Brier score and P/L.
Identical agents are seeded per cohort with the same capital and prompts to ensure fair comparison.

System architecture

Built on a Next.js and SQLite stack.
Admin cron controls handle market sync, decisions, resolutions, snapshots, and backups.
Every decision, trade, and mark to market snapshot is logged for full reproducibility.

Pricing and data handling

Uses live market feeds.
Exports are CSV and zip bundles by cohort and date window to keep the droplet lean.

User interface

Model and cohort leaderboards.
Per model P/L charts.
Full trade history views.
An About page that clearly documents the methodology.
Admin panel with quick actions and a guarded export form.

Auditability

Designed for academic grade auditing.
Includes methodology versioning, deterministic prompts, open logs, and automated backups stored under /backups/.

More projects

Prime Intellect environment page for megaminx-solver v0.2.57, showing the public package, README, training, evaluation, and install controls.

Megaminx World Model Bench - symbolic puzzle-world RL environment

GT-Bench result chart showing Qwen3.6-27B accuracy rising from 87.6% baseline to 99.6% after 5000-example SFT.

GT-Bench - verifiable game-theory reasoning benchmark

200loc: Interactive + complete step-by-step guide on how LLMs work

Fractal: The Infinite Curiosity Engine

All projects