Results
In this section we compute the information content/entropy using the
statistics of the ROC curve, and the time series precision. In Figure 2,
we first address the skill and information content of the method
outlined in Figure 1 for a continuum of future time windows\(T_{W}\in[0.125,8.5]\) years.
Skill. Figure 2a shows the same ROC diagram as in Figure 1e for
a future time window of \(T_{W}=\)1 year. As discussed previously, the
red curve is the true positive rate (TPR), which ranges from 0 to 1. The
diagonal line is the true positive rate for an ensemble of 50 random
time series, each of which were obtained from the state variable time
series \(\Theta(t)\) using a bootstrap procedure of random sampling with
replacement. The ensemble of random time series is shown as the cyan
curves grouped near the diagonal line.
The skill, which is the area under the ROC curve, is shown in Figure 2b
as function of the future time window \(T_{W}\) , for fixed EMAN -value and \(\lambda\)-value. Figure 2c shows the skill indexSKI defined in (1), also as a function of \(T_{W}\). Both Figures
2b,c indicate that there is a maximum in skill at a value \(T_{W}\) =
0.625 years, and no skill at \(T_{W}\) = 6.875 years, where the skill
curve crosses the no-skill (dashed horizontal) line.
Shannon Information from ROC. To calculate the Shannon
Information entropy as a function of \(T_{W}\) using (3), we need a
probability mass function pmf. For this purpose, we use the ROC
curve as a cumulative distribution function, and difference it with
respect to threshold values \(T_{H}\) to obtain the pmf . Because
the ROC curve was constructed using 200 values of \(T_{H}\), there are
199 values of the pmf => \(p(\omega)\) to be used in
equation (3).
To compare the results with those for the no skill diagonal line, we
note that the diagonal line can also be regarded as a cumulative
distribution, but for a uniform pmf whose value is the constantpmf => \(p(\omega)\) = 1/N. For this value ofpmf , it is easy to show that \(I_{S}=\)7.64 bits.
According to the conventional interpretation of Shannon information, one
would need to ask, on average, 7.64 yes/no questions to establish the
value of a random state variable just prior to the occurrence of a major
earthquake during the following \(T_{W}\) years. Or in other words, the
number of yes/no questions needed to determine whether a given random
threshold state is followed by a window \(T_{W}\) that contains a large
earthquake.
By contrast, the actual ROC curve has a lower value of \(I_{S}\), and
therefore more information content, and lower entropy, than the random
ROC (diagonal line). For the value of \(T_{W}\) = 1.0 year, we find\(I_{S}\) = 4.29 bits, corresponding on average to 4.29 yes/no
questions.
A selection of these data are also summarized in Table 1, and are
compared to data from a simple illustrative simulation discussed below.
Data for skill, skill index, ROC Information, Information from random
ROC, Kolbeck-Leibler Divergence [3], and Jensen-Shannon
Divergence[4] are shown in the table as well. These latter
quantities are measures of the difference in information entropy between
the data and a random nowcast.
Shannon Information from Precision (PPV). More insight
into the information content/entropy of the state variable \(\Theta(t)\)can be realized using the positive predictive value (PPV) probability,
or precision. Figure 3a shows the optimized state variable as a function
of time, an enlarged version of Figure 1d.
Note in particular that the top area of the state variable curve
corresponds to enhanced quiescence prior to the occurrence of a large
earthquake, as explained previously and in Rundle et al. (2022).
Conversely, the bottom area of the curve corresponds to enhanced
activation, for example aftershock occurrence following a large event.
Figure 3b shows the precision, and Figure 3c shows the corresponding
self information \(I_{\text{self}}\) , equation (2), both quantities on
the horizontal axis and shown as a function of the threshold valueTH on the vertical axis. These are the magenta
curves in those figures. Figure 3 allows one to read horizontally and
associate a value of PPV and self-information \(I_{\text{self}}\) with a
given value of \(\Theta(t)\).
Also shown in Figures 3b,c are the PPV and \(I_{\text{self}}\) for an
ensemble of 50 random time series, these are the cyan curves. The mean
of the cyan curves is shown as a solid black line, and the 1 \(\sigma\)confidence limits are shown as dashed lines. Each random time series in
the ensemble is again computed by sampling with replacement the time
series \(\Theta(t)\), then for each curve calculating the PPV and\(I_{\text{self}}\) for that curve.
A main finding from Figure 3 is that the statistics of future time
windows \(T_{W}\) for the ensemble of random time series do not depend
on the value of the threshold \(T_{H}\) . The random (uniform)
probability of a future window \(T_{W}\) containing a large earthquake
is about 10%, for example. By contrast, the probability of a future
time window containing a large earthquake increases dramatically as the
time series \(\Theta(t)\) increases from bottom of the chart (activation
phase) to the top (quiescence phase).
We also see in Figure 3c that the information entropy is basically the
same for the ensemble or random curves as for \(\Theta(t)\) in the
activation condition. Conversely, as quiescence becomes more dominant
and the time of a large earthquake approaches, entropy decreases and
information content correspondingly decreases.
We can also understand why the self-information \(I_{\text{self}}\) for
the random time series is approximately 3.35 bits. In the figure, we
considered a series of \(T_{W}\) = 1 year windows from 1970 to early
2022. There are thus a little more than 51 non-overlapping, independent
time windows.
During this time period, there are 5 major earthquakes having magnitudes\(M\geq 6.75\): M6.9 Loma Prieta; M7.3 Landers; M7.1 Hector Mine; M7.2
El Mayor Cucupah; and M7.1 Ridgecrest earthquakes. If the earthquakes
were distributed randomly in time, there would be a probability of\(p(\omega)=5/51=0.098\) of finding a large earthquake in any of
these time windows.
Thus we calculate a self-information entropy for the mean of the random
ensemble curves of \(I_{\text{self}}=-Log_{2}(5/51)=3.35\) bits.
Therefore it would take on average 3.35 yes/no questions to determine if
one of these future time windows \(T_{W}\) contains a large earthquake.
Conversely, it is apparent that the self-information entropy of the PPV
of \(\Theta(t)\) approaches 0 as the seismically quiescence phase
becomes fully developed.
The primary conclusion from these calculations is that the information
content is higher in the quiescence phase of seismicity than the
activation phase. Or alternatively, that the activation phase has higher
entropy than the quiescence phase.