Data and Method
In recent research, we have developed methods that we call earthquakenowcasting whose goal is to estimate the current state of hazard.
A number of authors have now begun to use these methods in a variety of
applications. Recent research has developed the idea of earthquake
nowcasting, which uses state (“proxy”) variables to infer the current
state of the earthquake cycle (Rundle et al., 2016, 2018, 2019, 2021a,b;
Rundle et al., 2022; Rundle and Donnellan, 2020; Pasari and Mehta, 2018;
Pasari, 2019, 2020; Pasari and Sharma, 2020; Luginbuhl et al. 2019;
2020). An approach such as this is needed since the cycle of stress
accumulation and release is not observable (Rundle et al., 2021b;
Scholz, 2019). These first approaches to nowcasting has been based on
the concept of natural time (Varotsos et al., 2001; 2002; 2011, 2013;
2014; 2020a,b; Sarlis et al., 2018).
More specifically, in this work we analyze the result of applying a
filter that, when applied to a timeseries of small earthquakes, reveals
the cycle of large earthquake occurrence and recovery. Details of the
process of building, optimizing, and applying the filter is indicated in
Figure 1, and discussed elsewhere (Rundle et al, 2022). The Python code
used to compute the filter is available on the ESSOAR site as well. In
this section, we sketch the process, details of which can be found in
the cited references.
A critical component of the current approach is that the information is
encoded in the earthquake clusters or bursts, a series of events closely
spaced in time (Rundle et al., 2020; Rundle and Donnellan, 2020). Bursts
are a temporal clustering of highly correlated seismicity, typically in
a small spatial region.
Data. Referring to Figure 1, we begin with the seismicity in a
regional box of size 10o latitude by
10o longitude centered on Los Angeles, CA (Figure 1a).
The timeseries of earthquakes in that region since 1970, having
magnitudes M > 3.29, is shown in Figure 1b as a series of
vertical lines. Also shown as a blue curve is the exponential moving
average (EMA) with number of weights N = 36 [1]. Note that
the blue curve shows an “inverted” cycle of large earthquakes that is
the primary basis for the nowcast filter. This inverted cycle shows a
sudden decrease at the time of a large earthquake, due to the occurrence
of the aftershocks.
In Figure 1c, we show a time series for the mean number \(\mu(t)\) of
small earthquakes as a function of time. The mean is taken beginning in
1960, and is also shown since 1970. It can be seen that the mean does
not indicate a steady state. Instead, there is a general increase in
mean number of events up to about 1993, after which it shows a cycle
similar to that in Figure 1b.
This catalog behavior may be due either to actual tectonic processes, or
perhaps to changes in methods of earthquake detection and magnitude
assignment in the early 1990’s, when the network was fully automated and
digital (Hutton et al., 2010). In fact, it is interesting that the
temporal trends in Figure 1c seem to somewhat mirror the general
historical change in number of seismic stations in California as shown
in Figure 3 of Hutten et al. (2010).
State Variable \(\Theta(t).\)The data in Figures 1b and 1c are
then combined to form the state variable timeseries \(\Theta(t)\) shown
in Figure 1d. The state variable itself is the EMA average of the small
earthquakes, then adjusted using the current mean number \(\mu(2022)\)of
small earthquakes, using a constant of proportionality \(\lambda\). TheN -value and \(\lambda\)-value are obtained by optimizing the ROC
skill.
The adjustment corresponds to an assumption that there is a minimum
number of small earthquakes that occur each month. An important
component of this adjustment is the assumption that there appears to be
a transition from unstable seismic slip, observable with seismometers,
to stable sliding that is observable only with geodetic observational
instruments such as GNSS or InSAR. Figure 1d then represents an
“inverted” and adjusted and EMA averaged timeseries of the small
earthquakes.
Receiver Operating Characteristic (ROC). To calculate the the
EMA N -value, and the contribution of the mean number \(\mu(t)\)of small earthquakes, we construct the temporal Receiver Operating
Characteristic (ROC) for a forward time window \(T_{W}=\)3 years beyond
a given time t (Rundle et al., 2022). The ROC curve [2] is
constructed by establishing a series of increasing thresholdsTH in the state variable \(\Theta(t)\) from low
values to high values. We then consider all values of time, and a series
of 4 clauses (statements). A review of these methods can be found in
Jolliffe and Stephenson (2003).
For each time t : if a given \(\Theta(t)\geq T_{H}\) and a large
earthquake occurs within the next TW years, we
classify that as true positive TP; if \(\Theta(t)\geq T_{H}\) and no
earthquake occurs within the next TW years, we
classify as false positive FP; if \(\Theta(t)<T_{H}\) and no
earthquake occurs within the next TW years, we
classify as true negative TN; if \(\Theta(t)<T_{H}\) and an earthquake
does occur within the next TW years, we classify
as false negative FN. We then repeat this procedure over all values of
threshold \(T_{H}\).
Having TP, FP, FN, TN, we then define the true positive rate TPR or
“hit rate” TP/(TP + FP); the false positive rate FPR or “false alarm
rate” FP/(FP+TN). A plot of TPR against FPR defines the ROC curve,
which is the red curve in Figure 1e. For future consideration, we also
define the Precision, or positive predictive value as TP/(TP+FP), the
fraction of predictions that turn out to be accurate. These and other
quantities are described in [2].
Supervised Machine Learning. The area under the ROC curve is
the Skill , which specifies the ability of the method to
discriminate between true signals and false signals. The diagonal line
in Figure 1e is the no skill line, equivalent to a random predictor.
Note that the area under the no skill line is 0.5.
For a method to have skill, the ROC curve must either be above the
diagonal line, or below it. For a method with skill, the area under the
ROC curve can either be a maximum of 1.0, or a minimum of 0.0. For
future reference, we define a skill index SKI in % as:
\(SKI=100(|\frac{\text{Skill}}{0.5}-1.0|)\) (1)
SKI can then range from 0% (when skill = 0.5), to 100% (when
skill is either 1.0 or 0.0). In Figure 1e, the no skill area is
indicated by the darker shaded area. The skill of the nowcasting method
that we discuss here is indicated by the total shaded area.
As discussed in Rundle et al. (2022), we find the optimal values ofN for the EMA, and the contribution of the mean earthquakeµ (t ) adjustment λ , by maximizing the skill. This is
indicated in Figure 1d and 1e as a feedback between the state variable
curve and the ROC skill calculation. This procedure results in a filter
that has been optimized by well-understood, reliable methods. We note
that the code is available on the AGU preprint archive ESSOAR (Rundle et
al., 2022, supplemental tab).
Shannon Information. Claude Shannon’s famous paper on
statistical communication theory (Shannon, 1948) describes a measure of
the information content of a message between communicating parties. It
is based on the idea of viewing a message consisting of a bit string as
a series of intermixed 1’s and 0’s, with an associated entropy of
mixing. Examples of the use of these methods can be found, e.g., in
Cover and Thomas (1991), and Stone (2015).
The usual interpretation of Shannon information entropy is then the
number of binary yes/no questions that must be asked in order to
determine the information in the message being sent. If more questions
are required, the entropy is higher and the information communicated is
more surprising. Conversely if fewer questions are required, the entropy
is lower and the information communicated is not as surprising.
For a message in which symbols in an alphabet (i.e., the 1’s and 0’s)
have probability mass function p (\(\omega\)), where\(\omega\in\)[0,1], the self-information \(I_{\text{self}}\)associated with a given symbol is:
\(I_{\text{self}}=-Log_{2}p(\omega)\) (2)
The Shannon information of a string of symbols is then given by the
expectation of the self-information:
\(I_{S}=-\sum_{\omega}p(\omega)Log_{2}p(\omega)\) (3)