Forecaster Arena - A new and uncontaminated LLM benchmark based on prediction markets

Forecaster Arena - A new and uncontaminated LLM benchmark based on prediction markets

Forecaster Arena - A new and uncontaminated LLM benchmark based on prediction markets

Forecaster Arena is a live, open benchmark where top large language models trade real Polymarket odds. Reality, not static test sets, determines which model forecasts best through transparent profit and loss and Brier scoring.

How it runs

  • The platform runs seven frontier LLMs head to head in live Polymarket markets.

  • Portfolios are marked every 10 minutes with full transparency using both Brier score and P/L.

  • Identical agents are seeded per cohort with the same capital and prompts to ensure fair comparison.

System architecture

  • Built on a Next.js and SQLite stack.

  • Admin cron controls handle market sync, decisions, resolutions, snapshots, and backups.

  • Every decision, trade, and mark to market snapshot is logged for full reproducibility.

Pricing and data handling

  • Uses live market feeds.

  • Exports are CSV and zip bundles by cohort and date window to keep the droplet lean.

User interface

  • Model and cohort leaderboards.

  • Per model P/L charts.

  • Full trade history views.

  • An About page that clearly documents the methodology.

  • Admin panel with quick actions and a guarded export form.

Auditability

  • Designed for academic grade auditing.

  • Includes methodology versioning, deterministic prompts, open logs, and automated backups stored under /backups/.

More projects