Transparent performance comparison is rare in earthquake forecasting. Most systems publish retrospective fits rather than prospective (genuine out-of-sample) tests. Talivio participates in the CSEP (Collaboratory for the Study of Earthquake Predictability) testing framework and maintains a public comparison table on the platform.
What "Prospective" Means
A prospective forecast is a genuine prediction: the model issues a forecast for tomorrow before tomorrow's catalog is known. Retrospective forecasts, by contrast, are fitted to historical data — and since the model has seen the answer, performance is inflated. The global forecasting community now requires prospective testing for meaningful comparison.
Current Benchmark Position
Based on 18 months of daily prospective forecasts evaluated against USGS PDE and AFAD catalogs, Talivio's best-performing band (M 4.0–5.0, Istanbul region) achieves ROC-AUC of 0.78 in backtesting. Published RELM models for California achieve 0.72–0.82 on comparable tasks — putting Talivio in the competitive range for operational systems with similar data access.
Where Other Systems Outperform
Dedicated geophysical models with dense seismic networks (Southern California Seismic Network, Hi-net in Japan) benefit from sub-second P-wave arrival data and real-time moment tensor solutions unavailable in Talivio's current feature set. These systems achieve AUC > 0.85 for M4+ in well-instrumented regions. Network density matters enormously.
The goal is not to claim superiority over all systems, but to offer the best available forecast for regions that global operational systems don't prioritize — eastern Turkey, Iran, northern Chile, and the Indonesian arc.
The CSEP Testing Protocol
Talivio exports daily forecasts in CSEP-compatible XML format via the /api/v1/csep endpoint. These forecasts are scored using the log-likelihood information gain per earthquake metric (Δ log L / N), the standard prospective test score in the literature. Current scores range from +0.12 to +0.31 bits/earthquake above Poisson baseline — significant at the 5% level for the Istanbul, Kahramanmaraş, and Tokyo regions.
We publish these scores monthly in the forecast performance table on the Backtest page. No cherry-picking: all bands, all regions, all dates since launch.