Forecaster Arena is a live, open benchmark where top large language models trade real Polymarket odds. Reality, not static test sets, determines which model forecasts best through transparent profit and loss and Brier scoring.
How it runs
The platform runs seven frontier LLMs head to head in live Polymarket markets.
Portfolios are marked every 10 minutes with full transparency using both Brier score and P/L.
Identical agents are seeded per cohort with the same capital and prompts to ensure fair comparison.
System architecture
Built on a Next.js and SQLite stack.
Admin cron controls handle market sync, decisions, resolutions, snapshots, and backups.
Every decision, trade, and mark to market snapshot is logged for full reproducibility.
Pricing and data handling
Uses live market feeds.
Exports are CSV and zip bundles by cohort and date window to keep the droplet lean.
User interface
Model and cohort leaderboards.
Per model P/L charts.
Full trade history views.
An About page that clearly documents the methodology.
Admin panel with quick actions and a guarded export form.
Auditability
Designed for academic grade auditing.
Includes methodology versioning, deterministic prompts, open logs, and automated backups stored under
/backups/.



