Historical validation — how did the models perform in the past?
HSS measures how much better a model performs compared to random guessing. It is the standard reference metric in operational earthquake forecasting systems.
a = TP | b = FP | c = FN | d = TN — computed from the full confusion matrix.