Scientific Methodology

How does the multi-band earthquake
probability forecast system work?

Talivio uses per-region band ML models trained via an expanding-window backtest (2000–present). For each seismic region and magnitude band, a 4-algorithm competition (LightGBM, RandomForest, ExtraTrees, CalibratedLR) selects the best model using walk-forward cross-validation with hard same-region temporal negatives — preventing geographic classification leakage. Regional ROC-AUC values range from 0.62 to 0.99 across 10 global regions. CSEP-compatible forecast export for independent validation. Weekly automatic retraining.

Loading model metrics…

1. System Overview

The regional band model architecture trains a separate ML model for each (region, magnitude band) pair using an expanding-window backtest. Hard same-region negatives (temporal: same coordinates, different time; spatial: offset within the same seismic zone) are used to prevent geographic leakage. A 4-algorithm competition (LightGBM, RandomForest, ExtraTrees, CalibratedLR) selects the winning algorithm per band via cross-validated AUC. The global prediction endpoint retains the 102-feature ensemble (Coulomb stress, ETAS, GR b-value, multi-scale SRA, H3 spatial connectivity) for backwards-compatible API access.

5
Magnitude Bands
(M3–4, M4–5, M5–6, M6–7, M7+)
102
Features (94 core + 8 spatial:
SRA, Moment, ETAS, H3…)
4-Ensemble
LightGBM + RF + ET + GB
(Soft Voting Classifier)
8
Data Sources
(USGS, AFAD, NGL, GEM, ISC…)

2. Magnitude Bands

Each band has a different feature dimensionality, data density, and dominant physical mechanism. In sparse bands (M6+) the seismic cycle and physical parameters take precedence, while in dense bands (M4–5) statistical precursors carry more weight.

M 3.0 – 4.0

Micro Earthquakes

Highest event rate; serves as a precursor indicator for larger events. Micro-cluster density and foreshock patterns are key signals.

115 featuresLightGBM ensembleClass: very high positive rate
M 4.0 – 5.0

Small–Moderate Earthquakes

Most frequent damaging class; high training data availability. Foreshock density and quiescence anomaly are the dominant signals in this band.

112 featuresLightGBM-weightedClass: high positive rate
M 5.0 – 6.0

Moderate Earthquakes

Magnitude range capable of causing damage. The 24-month linear regression of the b-value trend is the leading precursor signal in this band.

112 featuresb-value trend weightedModerate frequency
M 6.0 – 7.0

Large Earthquakes

Potential for serious structural damage and casualties. Seismic cycle analysis with inter-event normalisation plays the decisive role.

112 featuresSeismic cycle weightedRare class
M 7.0 +

Destructive Earthquakes

Least training data; hardest prediction class. Physical parameters (slip rate, last rupture date, fault length) gain greater weight.

13 featuresPhysical parameters weightedVery rare

3. Feature Engineering

Features are constructed as a total of 29 core features across nine categories. Coulomb stress transfer, the ETAS statistical model and SRA were added in v4.0; IERS 2010 solid earth tides and FFT spectral features were added in v5.0.

3.1 Fault Mechanics and Geometry (6 features)
FeaturePhysical MeaningSource
Fault distanceDistance to nearest active fault segment (km)GEM Global Active Faults
Coupling ratioFault coupling ratio (0=creeping, 1=fully locked); elastic energy accumulation indicatorFault DB + GNSS
Slip rateLong-term slip rate of the fault (mm/yr); rate of energy accumulationGEM / AFAD Fault DB
Cross-fault velocityGNSS cross-fault differential velocity (mm/yr); plate motion difference across faultGNSS microservice
Seismic gapTime elapsed since last major earthquake (years); identifies mature segmentsHistorical catalogue
Last ruptureTime since last surface rupture; basis of seismic cycle calculationHistorical catalogue
3.2 Coulomb Stress Transfer (2 features — new in v4)
FeaturePhysical MeaningSource
Cumulative ΔCFSCumulative Coulomb stress effect of historical M≥5.5 earthquakes (bar) — Okada (1992) dislocation modelComputation + USGS/AFAD
Recent ΔCFSLargest individual Coulomb stress change in the recent period (bar) — triggering potentialComputation
Coulomb Failure Stress: ΔCFS = Δτ + μ'·Δσn — shear stress change + effective friction × normal stress change. Positive ΔCFS (>0.1 bar) brings the fault closer to failure (King et al., 1994; Stein, 1999). Calculated using the Okada (1992) semi-infinite elastic medium solution.
3.3 ETAS Statistical Model (3 features — new in v4)
FeaturePhysical MeaningSource
ETAS densityETAS model estimated instantaneous seismicity rate (events/day/km²) — Ogata (1988) MLEComputation
Background rateBackground seismicity rate μ (excluding clustering); tectonic loading indicatorOgata (1988)
Triggering ratioTriggered/total ratio; high value = active aftershock sequenceComputation
ETAS Model: λ(t,x,y) = μ(x,y) + Σ K·eα(M-Mc) / (t-tᵢ+c)p · f(x-xᵢ,y-yᵢ) — the statistical model used by USGS, Italy INGV and Japan JMA for operational earthquake forecasting (Ogata, 1988).
3.4 Seismicity Rate Analysis — SRA (5 features — new in v4)
FeaturePhysical MeaningSource
Short-term rateRegional earthquake count over the short term; short-term activityUSGS + AFAD
Medium-term rateEarthquake count over the medium term; medium-term activityUSGS + AFAD
Long-term rateEarthquake count over the long term; long-term backgroundUSGS + AFAD
Activity ratioShort/long-term activity ratio (SRA). Detection of uplift or quiescenceComputation
Quiescence Z-scoreDeviation of activity from historical mean. Anomaly detection (Habermann, 1988)Computation
3.5 Gutenberg-Richter and Precursor Signals (4 features)
FeaturePhysical MeaningSource
b-valueGutenberg-Richter b-value (Aki 1965 MLE); low b = high stressSeismic catalogue
b-value trendTime-series slope of b-value; decreasing trend signals stress increaseComputation
Precursor densityDensity of small nearby earthquakes (foreshock density)USGS + AFAD
Moment deficitAccumulated/expected seismic moment ratio; ≥1 indicates energy surplusFault DB + Catalogue
3.6 Geophysical and Celestial (2 features)
FeaturePhysical MeaningSource
GNSS deformationCrustal deformation rate computed via spatial interpolation from GPS stations (nstrain/yr)NGL Nevada Geodetic Lab
Tidal indexCombined Moon+Sun tidal stress (normalised 0–1)Ephemeris computation
3.7 Solid Earth Tides (2 features — new in v5)
FeaturePhysical MeaningSource
Volumetric strainVolumetric strain computation based on IERS 2010 conventions (nanostrain). Typical amplitude: 50–120 nsComputation (IERS 2010)
Tidal CoulombCoulomb projection of tidal stress onto the fault plane (kPa). Depends on fault geometryComputation
Solid Earth Tides (IERS 2010): εvol = (1 + h₂ - 3·l₂) / (R·g) · W₂ — volumetric strain is computed from the degree-2 tidal potential. Provides a physically more accurate computation than the existing tidal_stress_index (1/r³ approximation). Ide et al. (2016) showed that large earthquakes occur statistically more frequently during periods of elevated tidal stress.
3.8 FFT Spectral Features (5 features — new in v5)
FeaturePhysical MeaningSource
Dominant periodSpectral dominant period of the seismicity time series; decrease = quiescence signalComputation
Spectral centroidWeighted mean of the frequency distributionComputation
Spectral entropyShannon spectral entropy (0–1 normalised); rising disorder = breakdown of harmonic orderComputation
Spectral slopeLog-log spectral slope; steepening = red shift, quiescenceComputation
Energy ratioLow/high frequency energy ratio; increase = precursor signalComputation
Spectral Analysis: Characteristic changes are observed in the frequency spectrum of seismic activity before large earthquakes: the dominant frequency decreases, spectral energy shifts to lower frequencies, and harmonic order breaks down (Telesca et al., 2001; Kagan & Jackson, 1991). Spectral leakage is minimised using a Hanning window + rfft.
3.9 Four-Component Risk Model (v5)
Risk = w₁ × Mechanical + w₂ × Statistical + w₃ × Precursor + w₄ × Trigger

Mechanical (highest weight) — Gunpowder+Domino

Fault locking, slip deficit, and stress transfer from neighbouring earthquakes. Weighted average of three sub-components.

Statistical — ETAS

ETAS model seismicity density estimate and short/long-term activity ratio (SRA). Operational forecasting standard.

Precursor — Smoke

b-value decrease, seismic quiescence, and foreshock density. Weighted combination of four sub-signals.

Trigger (lowest weight) — Spark

Moon–Sun tidal stress; triggering potential when fault is near threshold. Scientifically small (~1–3%) but statistically significant.

Coulomb Stress Threshold

Fault segments with ΔCFS > +0.1 bar (10 kPa) receive an automatic uplift in the mechanical component. King et al. (1994) triggering threshold.

Critical Window Condition

If Mechanical ≥ 0.6, Statistical ≥ 0.5, Precursor ≥ 0.4, and Trigger ≥ 0.6 are simultaneously met, the system raises the Critical Window flag.

3.10 Spatial Connectivity Layer (new in v6)

The earthquake catalogue is divided into spatial cells via a hierarchical hexagonal grid. Spatial features are computed using a multi-scale neighbourhood structure. Temporal leakage is prevented by leak-free binary search (only past events are used).

K1/K2 Neighbour Rate

spatial_neighbor_rate_k1/k2: 90-day earthquake activity rate in near and extended neighbour cells.

K1/K2 Max. Magnitude

spatial_max_mag_k1/k2: Maximum earthquake magnitude within 90 days in neighbouring cells.

Activity Trend

spatial_activity_trend: Last 30-day vs. 90-day activity ratio — detection of acceleration or deceleration.

Cluster Density

spatial_cluster_density: Ratio of active cells in the centre + K1 neighbours — measure of spatial clustering.

Fault Smoothing

spatial_fault_smoothed_risk: Distance-weighted fault cell seismicity using exp(-dist/decay_km).

Strain Gradient

spatial_strain_gradient: Standard deviation of the activity difference between the centre cell and its neighbours.

SpatialIndex: The earthquake catalogue is sorted chronologically into cells. Each query runs in O(log n) time via binary search. Comprehensive active-cell and historical-event database (1990–present).

4. 4-Model Ensemble Architecture (v8)

In v8.0 the MTL architecture was replaced by a 4-model soft-voting ensemble. Each model contributes weighted probability votes. The ensemble mitigates individual model weaknesses and produces more robust, well-calibrated predictions across all 5 bands.

Ensemble Architecture (Soft Voting):
[102 Features] → LightGBM (weight=3) ──┐
[102 Features] → RandomForest (weight=2) ──┤→ Weighted Average → P(event)
[102 Features] → ExtraTrees (weight=1) ──┤
[102 Features] → GradientBoosting (weight=2) ──┘
Calibration: Nested Isotonic Regression (CalibratedClassifierCV, cv=3)

LightGBM (w=3)

Primary model: n_estimators=300, max_depth=6, num_leaves=31. Gradient boosting with L1+L2 regularisation. Highest individual discriminative power.

RandomForest (w=2)

n_estimators=250, max_depth=8. Bagging-based diversity reduces variance. Robust to outliers and noisy features.

ExtraTrees (w=1)

Extremely Randomised Trees. Random split selection adds decorrelation. Fastest training; lowest weight but adds unique diversity.

GradientBoosting (w=2)

sklearn GBM with n_estimators=200, max_depth=5. Classical sequential boosting provides a different bias-variance tradeoff vs LightGBM.

Nested Isotonic Calibration

CalibratedClassifierCV (method='isotonic') converts ensemble output to a reliable probability. See current ECE in backtest metrics above.

CSEP Forecast Export

Daily CSEP-compatible grid forecasts (ASCII/CSV/XML) are generated for independent validation via cseptesting.org. GR-Poisson + ML risk overlay.

Daily Prospective Forecasts + Weekly Retraining: APScheduler generates prospective forecasts every day at 03:00 UTC. A backtest runs every Monday; if a >5% drop in ROC-AUC is detected, the full training loop is launched. CSEP forecasts are regenerated daily.

4b. Stacking Ensemble Architecture (v4, no longer used)

The multi-layer stacking ensemble used prior to v7.0. Retained here for reference.

Legacy Ensemble Architecture:
Layer 1 (Base Learners): XGBoost + LightGBM + CatBoost + ExtraTrees + Random Forest
Layer 2 (Meta-Learner): Logistic Regression (L2) — combines Layer 1 outputs
Layer 3 (Calibration): Isotonic Regression — reliable probability output (ECE < 0.05)
AlgorithmVersionStrengthsBand Advantage
XGB XGBoost 2.x Histogram-based splitting, L1/L2 regularisation, GPU support, missing value handling M4–5, M5–6 (dense data)
LightGBM LightGBM 4.x DART regularisation, leaf-wise growth, scale_pos_weight, fastest training M4–5 (speed + accuracy)
CB CatBoost 1.x Ordered boosting (target leakage prevention), native categorical features, low overfitting M5–6, M6–7
ET ExtraTrees sklearn 1.x Extra random splits; low variance, fast training M6–7 (sparse data)
RF Random Forest sklearn 1.x Bagging, interpretability, Gini importance ranking, SHAP values All bands (baseline)
Weekly Automatic Retraining: APScheduler triggers a backtest every Monday. If a >5% drop in ROC-AUC is detected for a band, the full training loop for that band is launched. ETAS parameters are updated, the Coulomb stress field is recomputed, and 5 algorithms × 4 bands = 20 base models + 4 stacking ensembles are trained.

5. Data Sources and Historical Catalogue

The platform draws on 8 different data sources. The historical catalogue is merged from multiple sources; after magnitude homogenisation (ML→Mw, Ulusay et al. 2004), aftershocks are separated using Gardner-Knopoff (1974) declustering. Record priority: AFAD > USGS > ISC > ISC-GEM > Ambraseys.

1035 – 1900
Ambraseys (2002): Historical Turkey and Middle East seismicity. Magnitudes converted from intensity for M≥6.5. Based on geodetic and documentary evidence.
1900 – present
ISC Bulletin: International Seismological Centre global catalogue. Comprehensive record from the start of the instrumental era.
1900 – 2009
ISC-GEM Global Instrumental Earthquake Catalogue (2013): Recomputed moment magnitudes (Mw); systematic magnitude homogenisation; M≥5.5.
1900 – present
AFAD Earthquake Database: National authority record for Turkey and surroundings. Events overlapping with USGS are deduplicated in favour of AFAD.
1970 – present
USGS Earthquake Hazards Program: Global coverage; moment tensor solutions; automatic update every 6 hours. Primary source for the current period.
current
NGL Nevada Geodetic Laboratory: 500+ GPS stations, MIDAS solution; mm/yr velocity vectors (Blewitt et al., 2018). Strain rate and cross-fault velocity computed via KD-Tree.
current
GEM Global Active Faults: 11,000+ active fault segments; strike/dip/rake, slip rates, last rupture date. Loaded automatically during region onboarding.
AFAD + USGS Combined Catalogue (v4): AFAD records M≥1.5 events for Turkey (USGS: M≥4.0). Event matching uses ±30 s and ±50 km windows. AFAD is preferred for location (local network is more accurate), USGS for magnitude (Mw standard). Magnitude homogenisation: ML→Mw (Ulusay et al., 2004 Turkey regression).
Gardner-Knopoff (1974) + ETAS Declustering: Two-stage declustering is applied. GK deterministic declustering separates aftershocks using mechanical windows. ETAS stochastic declustering assigns each event a "probability of being background" — more robust especially for M6+ bands.

6. Seismic Cycle Analysis

According to elastic rebound theory, a fault segment accumulates elastic energy proportional to the regional slip rate since the last major earthquake. McCann et al. (1979) classify mature and recently ruptured segments using the seismic gap criterion.

Gutenberg-Richter: log N = a − b·M
b-value (Aki 1965 MLE): b = log₁₀(e) / (M̄ − Mmin) ≈ 1/(M̄ − Mmin) × 0.4343
Stress loading rate: dσ/dt = (v_plate × μ × coupling) / A_fault
Recurrence interval: T_r = D_max / v_slip (D_max: maximum slip amount)

Seismic Gap Theory

McCann et al. (1979): fault segments from which a long time has passed since the last major earthquake are flagged as "probable gaps". Quiet segment = accumulated stress.

b-Value Change

Low b (b < 0.8) signals high crustal stress and an approaching large event (Wiemer & Wyss, 2002; Scholz, 1968). b = 1.0 is the long-term regional average.

North Anatolian Fault

Characteristic earthquake period ~250 years; westward-migrating rupture sequence (1939–1999). The Marmara segment has not had a major rupture for >300 years.

East Anatolian Fault

Mean recurrence interval ~300 years; the 2023 Kahramanmaraş earthquakes (Mw 7.8 + Mw 7.6) were simultaneous ruptures of multiple segments of this fault.

7. Celestial Mechanics

The PyEphem library uses VSOP87 analytical planetary theory to compute the positions of the Moon and Sun with nano-radian precision. Tidal stress is projected onto fault planes using Boussinesq load theory.

Synodic Cycle (29.5 days)

During New Moon and Full Moon periods (±5-day window) the Moon-Sun-Earth alignment maximises tidal stress. These periods are flagged as trigger pressure windows.

Anomalistic Month (27.55 days)

Near perigee the Moon's gravitational effect increases by 14% (1/r³ dependence). Perigee + Full Moon coincidence (Supermoon) produces the strongest tidal pressure.

Nodal Cycle (18.6 years)

The 18.6-year lunar orbital inclination oscillation (Saros cycle) creates a small (~1%) modulation in tidal stress; its effect is observed in long-term statistics.

Composite Index (0–1)

Normalised: (F_Moon/r³_Moon + F_Sun/r³_Sun) × syzygy_factor. Syzygy factor: 1 + 0.2 × cos(2π × lunar_phase). Range: ~27 MPa (minimum) – ~65 MPa (maximum).

8. Coulomb Stress Transfer (v4)

When a large earthquake occurs, stress redistributes in the surrounding crust. The Coulomb Failure Stress (CFS) change quantifies how close neighbouring faults are brought to failure. This mechanism has successfully explained many earthquake sequences including the 1994 Northridge and 1999 İzmit events.

ΔCFS = Δτ + μ'·Δσn

Δτ : Shear stress change on the receiver fault plane
Δσn : Normal stress change (compression negative)
μ' : Effective friction coefficient (0.4, including pore pressure)

Okada (1992) Model

Deformation and stress field produced by slip on a rectangular fault plane in a semi-infinite, homogeneous, isotropic elastic medium are computed analytically. Inputs: source fault geometry, elastic parameters (μ=3.2×10¹⁰ Pa, ν=0.25).

Wells & Coppersmith (1994)

Slip amount from magnitude: log₁₀(AD) = -4.80 + 0.69·M (metres). Rupture length: log₁₀(RLD) = -2.44 + 0.59·M (km). Rupture width: log₁₀(RW) = -1.01 + 0.32·M (km).

Triggering Threshold

ΔCFS > +0.1 bar (10 kPa) triggering potential (Stein, 1999). Cumulative ΔCFS effect of all M≥5.5 earthquakes in the past 50 years is computed and projected onto target fault segments.

Validation Examples

1999 İzmit (Mw 7.6) → Düzce segment ΔCFS: +3.2 bar → Düzce earthquake 87 days later. 1992 Landers → Big Bear triggering. 2023 Maraş multi-segment rupture.

9. ETAS Statistical Model (v4)

Epidemic-Type Aftershock Sequence (ETAS) is a point process that models seismic activity as the sum of the background seismicity rate and the triggering potential of each event. Used by USGS, Italy INGV, and Japan JMA for operational earthquake forecasting.

λ(t,x,y) = μ(x,y) + Σi:tᵢ<t K·eα(Mᵢ-Mc) / (t-tᵢ+c)p · f(x-xᵢ,y-yᵢ)

μ(x,y) : Space-dependent background seismicity rate (tectonic loading)
K, α, c, p : Parameters estimated via Ogata (1988) MLE
Mc : Completeness magnitude (Turkey network: 2.5)

USGS OAF

The official U.S. Operational Aftershock Forecasting system is ETAS-based. Aftershock probability forecasts are published within 1 hour of each major event.

Parameter Estimation

5 free parameters {K, α, c, p, μ} are estimated via Maximum Likelihood Estimation. K: triggering capacity, α: magnitude scaling, c/p: Omori decay rate.

SRA (Seismicity Rate Analysis)

SRA = R_7day / (R_1yr/52). SRA>2: anomalous increase. SRA<0.3: seismic quiescence — Habermann (1988) potential precursor anomaly.

AFAD Data Advantage

With M≥1.5 completeness, AFAD-driven ETAS parameterization uses 10× more events. Background rate estimation (μ) and b-value computation improve significantly.

10. Solid Earth Tides (v5)

Solid Earth tidal stress from the Moon and Sun is computed physically using Love numbers (h₂=0.6078, l₂=0.0847, k₂=0.2980) based on IERS 2010 conventions. Provides more accurate volumetric strain and Coulomb projection than the legacy 1/r³ approximation.

W₂ = (GM/r) · (R/r)² · P₂(cos z) — Degree-2 tidal potential

εvol = (1 + h₂ - 3·l₂) / (R·g) · W₂ — Volumetric strain (nanostrain)

ΔCFStidal = (3/2) · μ · εtidal · sin(2θeff) — Coulomb projection onto fault plane

IERS 2010 Love Numbers

h₂=0.6078, l₂=0.0847, k₂=0.2980. Defines the elastic response of the crust to tidal forces. Degree-2 is the dominant component (Petit & Luzum, 2010).

Volumetric Strain

Typical amplitude: 50–120 nanostrain. Maximum at New/Full Moon. Moon contribution 68%, Sun contribution 32%.

Coulomb Projection

Tidal stress is projected onto Coulomb stress based on fault geometry (strike, dip). Standard crustal shear modulus used (Agnew, 2015).

Triggering Evidence

Ide et al. (2016) Nature Geoscience: large earthquakes occur statistically more frequently during high tidal stress periods. Particularly on shallow thrust faults.

11. EVT Custom Loss — Asymmetric Loss Functions (v5)

Extreme Value Theory (EVT)-motivated asymmetric loss functions heavily penalize missing rare but catastrophic events like M7+ (False Negatives). Used as custom objectives in XGBoost and LightGBM.

Asymmetric Log-Loss: L = -w · [y·log(p) + (1-y)·log(1-p)]
w = FN_penalty (y=1) or FP_penalty (y=0), per band

Focal Loss: FL(p) = -α · (1-pt)γ · log(pt) — optimized γ and α
Reduces contribution of easy examples to focus on rare/hard cases (Lin et al., 2017)

EVT Tail Loss: L = base × (1 + k·(1-p)tail_exp) — exponential penalty on tail FNs

M4-5: Balanced

Baseline weight. Frequent earthquakes; false alarms and misses balanced.

M5-6: FN-weighted

Miss penalty increases. Earthquakes capable of causing damage.

M6-7: Strong FN weight

FN far more costly; FP more acceptable. Serious damage potential.

M7+: Maximum FN penalty

False alarms are acceptable; missing an M7.5 is not (Coles, 2001).

12. Molchan Diagram — Scientific Evaluation (v5)

The Molchan diagram is the internationally accepted method for scientific evaluation of earthquake forecast models. It measures model skill via the tradeoff between alarm rate and miss rate.

τ (alarm rate) = alarmed volume / total volume
ν (miss rate) = missed earthquakes / total earthquakes

Diagonal (τ, 1-τ): random forecast — no skill
Askill = 0.5 - ∫ν dτ — area below diagonal (0=random, 0.5=perfect)

Statistical significance: σ² = 1/(12·N), z = Askill/σ (Zechar & Jordan, 2008)

Askill ≥ 0.35

Excellent. Outstanding forecast skill. Far superior to random model performance.

Askill ≥ 0.25

Very Good. Statistically strong signal.

Askill ≥ 0.15

Good. Sufficient skill for operational use.

Per-Band Analysis

Each magnitude band evaluated with separate Molchan analysis. p < 0.05 means model is significantly different from random (Molchan, 1991).

13. Backtest & Evaluation

Model performance is measured with a walk-forward expanding window backtest from 1990 to present. A 30-day purge gap prevents data leakage. 15 folds (2-year test windows) are used. The system reports per-region, per-band metrics (M3–4/M4–5/M5–6/M6–7/M7+) validated via expanding-window backtest.

ROC-AUC: Area under the Receiver Operating Characteristic curve. 0.5 = random forecast, 1.0 = perfect classifier.

Brier Score: Mean squared probability error. 0 = perfect, 0.25 = random.

ECE (Expected Calibration Error): Calibration accuracy. 0 = perfectly calibrated.

Precision@0.7: Fraction of true earthquakes when model predicts ≥0.7.
Avg. ROC-AUC (active bands)
Avg. Brier Score
Avg. F1 Score
Active Bands
Walk-Forward Expanding Window: Unlike standard k-fold, preserves temporal separation; future data never leaks into past training. A 30-day purge gap cleans the transition zone. Each fold uses all previous time periods as training set. Spatial features are subject to the same temporal constraints (leak-free binary search).
AUC Skill Score: . Significantly outperforms the null model (AUC > 0.5, p < 0.05).

13b. Regional Backtest — Hard Negative Validation

We compare two backtest methodologies: the Global Model (v1, trivial negatives from tectonically quiet zones) and the Regional Model (v2, hard negatives from the same seismic region). The regional approach forces the model to learn temporal prediction rather than geographic classification.

Global Model (v1) Legacy

Expanding window 1996-2026. Negatives from stable zones (Central Asia, Sahara, etc.).

BandAUCF1Assessment
M4-50.7500.519Realistic
M5-60.9480.716Inflated
M6-70.9920.889Inflated
M7+0.4940.333Insufficient data

Regional Model (v2) Current

10 regions, 100km radius, expanding window 2000–2026. Hard negatives from same seismic region. All active regions use per-region grid-search tuned params. Auto-updated from model registry.

RegionM4-5 AUCHit%M4-5 YrsM5-6 AUCM5-6 YrsM6-7
Loading...

Loading regional summary...

Why v1 M5-6/M6-7 scores are inflated — Feature Importance Analysis

The v1 model's near-perfect M5-6 and M6-7 scores are caused by trivial geographic separation between positives (active fault zones) and negatives (stable continental interiors). The model learns "is this location on a fault?" rather than "will this fault rupture soon?".

FeatureImportanceType
cross_fault_velocity_mm_yr 0.186 Geographic
strike_sin 0.155 Geographic
b_value 0.106 Seismological
strike_cos 0.093 Geographic
distance_to_fault_norm 0.086 Geographic

4 of the top 5 features are geographic/fault geometry features — confirming the model classifies location, not timing. The v2 regional model eliminates this bias by using negatives from the same fault system, forcing temporal discrimination.

Istanbul M4-5 Year-by-Year Results (v2, tuned, expanding window)

YearTrainTest (pos/neg)AUCAccHit Rate
2008<20082 / 80.18850.0%0.0%
2010<20103 / 120.41766.7%0.0%
2011<20112 / 80.56380.0%0.0%
2013<20133 / 120.75073.3%0.0%
2016<20163 / 120.63980.0%66.7%
2018<20182 / 60.41775.0%50.0%
2019<20195 / 200.93088.0%80.0%
2020<20203 / 120.36166.7%33.3%
2025<20258 / 320.82077.5%75.0%

Overall: AUC=0.619 (11 testable years, expanding-window, hard same-region negatives). 2019 shows AUC=0.930 (pre-Marmara seismicity buildup); 2025 AUC=0.820 (recent activity spike). Istanbul has sparse M4-5 events (median 3/yr), making per-year estimates high-variance.

Maraş M4-5 — Best-Tested Region (19 years, AUC=0.796)

YearPos/NegAUCHit%YearPos/NegAUCHit%
20003/120.4440% 20153/120.66733%
20022/80.0630% 20173/120.4170%
20032/80.4380% 20184/160.53150%
20074/160.71975% 20192/80.3750%
20112/80.1250% 20202/80.1880%
20129/360.71044% 20212/80.62550%
20136/240.86833% 20225/200.52020%
20142/80.50050% 202360/500.71678%
202417/500.73471%

Maraş 2023–2024 (post-Kahramanmaraş M7.8 sequence): AUC=0.716/0.734 — the model correctly identified elevated risk in the aftershock-rich period. 2013 shows AUC=0.868 (Doğanyol aftershock sequence). Strategy: neg_ratio=2, temporal_ratio=0.6 (grid-search tuned). Overall AUC=0.796 over 19 test years.

Cross-Validation Summary: Loading dynamic summary...

13c. CSEP / RELM Benchmark Comparison

How does Talivio's regional model compare to published prospective earthquake forecast benchmarks tested through the Collaboratory for the Study of Earthquake Predictability (CSEP)? Below we compare against the best RELM (Regional Earthquake Likelihood Models) results from the California testing center and ETH Zurich's Switzerland testing center.

Model / Study Region Best AUC Area Skill Score Negative Strategy Validation
FCN Deep Learning (GJI 2024) California 0.882 Same-region Pseudo-prospective
ETAS Italy / OEF-Italy (Taroni 2023) Italy 0.70 Same-region Prospective 10yr
STEP (Gerstenberger et al. 2005) California ~0.65 Random cells Prospective (RELM)
ETAS (Helmstetter et al. 2007) California ~0.68 Seismicity rate Prospective (CSEP)
EEPAS (Rhoades & Evison 2004) New Zealand ~0.70 Precursory swarms Prospective (CSEP)
DeVries et al. 2018 (Google Brain) Japan 0.849 (inflated) Geographically distinct Retrospective
Talivio (this work) 10 regions Hard same-region
temporal + spatial
Expanding-window
no future leakage

ASS = Area Skill Score (Molchan diagram diagonal area). FCN 0.882 and ETAS Italy 0.70 from published papers. Talivio ASS computed live from prospective forecasts — grows as more forecasts are verified.

Key insight: The DeVries et al. (2018) AUC of 0.849 is widely cited but uses geographically distinct negatives — the same design flaw as Talivio v1. When hard same-region negatives are used (as in Rouet-Leduc 2022 and Talivio v2), realistic AUC drops to 0.65–0.75. Talivio v2's M4-5 AUC of and M5-6 AUC of is consistent with — and in high-activity zones exceeds — the state-of-the-art honest retrospective validation range.
Prospective Trial (ongoing): Talivio generates daily ML forecasts for all active regions × magnitude bands (M4-5, M5-6, M6-7, M7+). Each forecast covers a 30-day window; outcomes are verified automatically against the USGS catalog. Expanding-window pseudo-prospective forecasts (verified against held-out future periods) provide the primary out-of-sample validation record. Daily real-time forecasts accumulate continuously, extending the prospective evaluation with each passing day. Area Skill Score (Molchan diagram) is computed live from all verified forecasts — see current ASS value in the comparison table above.

14. References

1
Maximum likelihood estimate of b in the formula log N = a − bM and its confidence limits. Aki, K. (1965). Bulletin of the Earthquake Research Institute, 43, 237–239.
2
Frequency of earthquakes in California. Gutenberg, B. & Richter, C.F. (1944). Bulletin of the Seismological Society of America, 34(4), 185–188.
3
Seismic gaps and plate tectonics: seismic potential for major boundaries. McCann, W.R. et al. (1979). Pure and Applied Geophysics, 117(6), 1082–1147.
4
Mapping spatial variability of the frequency-magnitude distribution of earthquakes. Wiemer, S. & Wyss, M. (2002). Advances in Geophysics, 45, 259–302.
5
Is the sequence of earthquakes in Southern California, with aftershocks removed, Poisson? Gardner, J.K. & Knopoff, L. (1974). Bulletin of the Seismological Society of America, 64(5), 1363–1367.
6
Recalculated probability of M≥7 earthquakes beneath the Sea of Marmara, Turkey. Parsons, T. (2004). Journal of Geophysical Research, 109(B5).
7
The seismicity of North Africa and the Eastern Mediterranean region. Ambraseys, N. (2002). Annals of Geophysics, 45(3–4).
8
Simulation of ground motion using the stochastic method. Boore, D.M. (2003). Pure and Applied Geophysics, 160, 635–676.
9
New empirical relationships among magnitude, rupture length, rupture width, rupture area, and surface displacement. Wells, D.L. & Coppersmith, K.J. (1994). Bulletin of the Seismological Society of America, 84(4), 974–1002.
10
Advance short-term prediction of the large Hokkaido earthquake of September 25, 2003, M = 8.1. Shebalin, P.N. et al. (2006). Earth, Planets and Space, 56(8), 715–724.
11
ISC-GEM Global Instrumental Earthquake Catalogue (1900–2009). Storchak, D.A. et al. (2013). Data Science Journal, 12, WDS79–WDS96.
12
NGL Nevada Geodetic Laboratory GPS Time Series and Velocities. Blewitt, G. et al. (2018). Seismological Research Letters, 89(4), 1619–1628. geodesy.unr.edu
13
Static stress changes and the triggering of earthquakes. King, G.C.P., Stein, R.S. & Lin, J. (1994). Bulletin of the Seismological Society of America, 84(3), 935–953.
14
The role of stress transfer in earthquake occurrence. Stein, R.S. (1999). Nature, 402(6762), 605–609.
15
Statistical models for earthquake occurrences and residual analysis for point processes. Ogata, Y. (1988). Journal of the American Statistical Association, 83(401), 9–27.
16
Internal deformation due to shear and tensile faults in a half-space. Okada, Y. (1992). Bulletin of the Seismological Society of America, 82(2), 1018–1040.
17
Precursory seismic quiescence. Habermann, R.E. (1988). Pure and Applied Geophysics, 126(2–4), 279–318.
18
Earthquake forecasting using aftershock statistics and short-term clustering. Reasenberg, P.A. & Jones, L.M. (1989). Science, 243(4896), 1173–1176.
19
Attenuation relationships of peak ground acceleration and velocities from Turkish strong-motion data. Ulusay, R. et al. (2004). Engineering Geology, 73(1–2), 137–161.
20
IERS Conventions (2010). Petit, G. & Luzum, B. (2010). IERS Technical Note, 36.
21
Earth tides. Agnew, D.C. (2015). Treatise on Geophysics, 3.06.
22
Earthquake potential revealed by tidal influence on earthquake size-frequency statistics. Ide, S. et al. (2016). Nature Geoscience, 9(10), 834–837.
23
Structure of optimal strategies in earthquake prediction. Molchan, G.M. (1991). Tectonophysics, 193(4), 267–276.
24
Testing alarm-based earthquake predictions. Zechar, J.D. & Jordan, T.H. (2008). Geophysical Journal International, 172(2), 715–724.
25
Focal Loss for Dense Object Detection. Lin, T.-Y. et al. (2017). IEEE Transactions on PAMI, 42(2), 318–327.
26
An Introduction to Statistical Modeling of Extreme Values. Coles, S. (2001). Springer.
27
Investigating the multifractal properties of earthquake sequences. Telesca, L. et al. (2001). Chaos, Solitons & Fractals, 12, 2417–2425.