VOLCANS: an objective, structured and reproducible method for identifying sets of analogue volcanoes

Tierz, Pablo; Loughlin, Susan C.; Calder, Eliza S.

doi:10.1007/s00445-019-1336-3

VOLCANS: an objective, structured and reproducible method for identifying sets of analogue volcanoes

Research Article
Open access
Published: 06 December 2019

Volume 81, article number 76, (2019)
Cite this article

Download PDF

You have full access to this open access article

Bulletin of Volcanology Aims and scope Submit manuscript

VOLCANS: an objective, structured and reproducible method for identifying sets of analogue volcanoes

Download PDF

5593 Accesses
24 Citations
11 Altmetric
Explore all metrics

Abstract

The definition of a suite of analogue volcanoes, or volcanoes that are considered to share enough characteristics as to be considered exchangeable to a certain extent, is becoming a key component of volcanic hazard assessment. This is particularly the case for volcanoes where data are lacking or scarce. Moreover, volcano comparisons have often been based on similarities and differences inferred through expert judgement and not necessarily informed by volcano characteristics from global datasets. These similarities can be based on a range of features, from very simplified (e.g. statrovolcanoes) to very specific (e.g. detailed eruption chronologies), and may be strongly influenced by the personal experience of individuals or teams conducting the analogue analysis. In this work, we present VOLCANS (VOLCano ANalogues Search)—an objective, structured and reproducible method to identify sets of analogue volcanoes from global volcanological databases. Five overarching criteria (tectonic setting, rock geochemistry, volcano morphology, eruption size and eruption style), and a structured combination of them, are used to quantify overall multi-criteria volcano analogy. This innovative method is complementary to expert-derived sets of analogue volcanoes and provides the user with full flexibility to weigh the criteria and identify analogue volcanoes applicable to varied purposes. Some results are illustrated for three volcanoes with diverse features and significant recent and/or ongoing eruptions: Kı̄lauea (USA), Fuego (Guatemala) and Sinabung (Indonesia). The identified analogue volcanoes correspond well with a priori analogue volcanoes derived from expert knowledge. In some cases, single-criterion searches may not be able to isolate a reduced set of analogue volcanoes but any multi-criteria search can provide high degrees of granularity in the sets of analogue volcanoes obtained. Data quality and quantity can be important factors, especially for single-criterion searches and volcanoes with very scarce data (e.g. Sinabung). Nevertheless, the method gives stable results overall across multi-criteria searches of analogue volcanoes. Potential uses of VOLCANS range from quantitative volcanic hazard assessment to promoting fundamental understanding of volcanic processes.

Determining change points in data completeness for the Holocene eruption record

Article 08 October 2014

Stuart Mead & Christina Magill

FIERCE: FInding volcanic ERuptive CEnters by a grid-searching algorithm in R

Article 02 February 2017

Roberto Carniel, Silvina Guzmán & Marco Neri

Report on potential sampling biases in the LaMEVE database of global volcanism

Article Open access 25 May 2017

Natalia Irma Deligne, R. Stephen J. Sparks & Sarah K. Brown

Introduction

Volcanoes are powerful manifestations of our geodynamic planet. Volcanic physico-chemical processes are extraordinarily complex and operate on spatial and temporal scales ranging over many orders of magnitude (Sigurdsson et al. 2015). Therefore, volcanic behaviour is truly challenging to understand and even more challenging to forecast. Volcano scientists worldwide are asked, on a daily basis, to collect, process and interpret varied types and sources of volcanological data (e.g. deposits of volcanic material, chemical compositions of volcanic minerals and rocks, seismic signals from volcano-tectonic earthquakes, ground deformation, gas emissions) for its use in forecasting the future short- and long-term evolution of a particular volcanic system (e.g. Newhall and Hoblitt 2002; Loughlin et al. 2015; Newhall et al. 2017). Some frequently active volcanoes may appear somehow predictable (e.g. Stromboli, Italy), due to the fact that they tend to behave as they have behaved in the past (e.g. Blackburn et al. 1976; Rosi et al. 2000; Taddeucci et al. 2015). Yet, this by no means excludes relatively rapid changes in behaviour towards less expected phenomena: e.g. paroxysmal activity at Stromboli (e.g. Bertagnini et al. 2003; Calvari et al. 2006; Aiuppa et al. 2010), of which the 3 July and 28 August 2019 explosions were a powerful reminder. Other frequently active volcanoes may show more complex behaviours, such as switching between open-conduit (i.e. magma is filling the conduit below the crater and feeds the eruption) and closed-conduit regimes (i.e. the conduit is plugged by a semi- or totally-crystallysed rock). Some volcanoes may erupt frequently enough that sufficient data can be collected to postulate qualitative or quantitative forecasting models of volcanic activity (e.g. Volcán de Colima, Mexico; Luhr and Carmichael 1990; De la Cruz-Reyna 1993; Luhr et al. 2002). Volcanoes that erupt infrequently and/or have very few data available concerning past behaviour are more numerous: Loughlin et al. (2015) estimated that about 70% of the world’s Holocene (last ~ 12 kyr) volcanoes are very poorly studied. These volcanoes may show no eruptive signs for centuries and, therefore, go unnoticed by local populations but they may, in a matter of weeks to months, grow restless and produce energetic eruptions. For instance, Mount Pinatubo (Philippines), erupted in 1991 after approximately 500 years of repose, in what was the second largest eruption of the 20th century (Newhall and Punongbayan 1996). These poorly-studied, potentially dangerous volcanoes are one major reason why the volcanological community has sought to identify analogue volcanoes, which are volcanoes believed to share some commonalities in terms of their specific “type, magma composition, repose period, or any other characteristics of interest” (Newhall et al. 2017).

Volcano observatories and crisis-response teams worldwide, such as the Volcano Disaster Assistance Program of the United States Geological Survey (https://volcanoes.usgs.gov/projects/VDAP/about.html), have been using analogue volcanoes to inform their volcanic hazard assessments over the last decades, which has helped protect communities living around active volcanoes in many countries around the world (Sandri et al. 2012; Ogburn et al. 2015; Newhall and Pallister 2015; Newhall et al. 2017). More broadly, analogue volcanoes have been identified for the purpose of classification (Hone et al. 2007) or, in general, to improve hazard assessments at different temporal and spatial scales (Marzocchi et al. 2004; Mastin et al. 2009; Sandri et al. 2012; Whelley et al. 2015). Table 1 contains a summary of previous studies on analogue volcanoes, the purpose of the analysis and the criteria that have been most commonly used to define sets of analogue volcanoes.

Table 1 Previous studies that used sets of analogue volcanoes for different purposes (see first column). The main criteria which served to identify these sets are indicated. The last row displays the number of studies that used each criteria, out of the total number of studies shown

Full size table

Volcano types are usually grouped according to classical stratovolcanoes (Sheldrake 2014) or calderas (Sobradelo et al. 2010; Acocella et al. 2015) but more detailed analyses of geomorphological features have also been proposed (Whelley et al. 2015). Geographic setting has been used to ensure homogeneity between sets of volcanoes (Bebbington 2014) and it sometimes corresponds well with broad classes of tectonic setting (Hone et al. 2007; Whelley et al. 2015). Similarity in magma composition is typically described by simple geochemical divisions (e.g. mafic/felsic or basalt/andesite/dacite/rhyolite, Hone et al. 2007; Mastin et al. 2009; Ogburn et al. 2015), but more elaborate schema may include rock affinity (e.g. tholeiitic/calc-alkaline/alkaline, Sobradelo et al. 2010). In terms of (explosive) eruption size and style, Volcanic Explosivity Index (VEI, Newhall and Self 1982) is a customary choice (Bebbington 2014; Ogburn et al. 2015) but a great variety of more-or-less specific metrics have been used. These include the following: repose period, open/closed conduit, mass eruption rates, grain size distribution, lava-dome-growth durations and extrusion rates and generation of specific types of pyroclastic density currents (PDCs) (Marzocchi et al. 2004; Mastin et al. 2009; Sandri et al. 2012, 2014; Sheldrake 2014; Ogburn et al. 2015).

Classically, analogue volcanoes are identified by classifying volcanoes into isolated compartments or categories (e.g. andesitic stratovolcanoes that generate dome-collapse PDCs). At high levels of detail, the number of categories can be very large and volcanoes become seemingly unique (Cashman and Biggs 2014). Consequently, the identified sets of analogue volcanoes may be suited for a given specific problem but wider applicability of the methods and results is prevented. In addition, the common practice in choosing analogue volcanoes is almost purely based on the personal experience or knowledge of the hazard analyst or team. The selection of analogue volcanoes under such circumstances may be significantly biased by specific eruption behaviour patterns known from a handful of volcanoes rather than from global databases.

In this study, we develop an objective, structured and reproducible method for identifying sets of analogue volcanoes based on information from three global volcanological databases. We refer to this tool as VOLCANS (VOLCano ANalogues Search). Different criteria (tectonic setting, rock geochemistry, volcano morphology, etc.) are used to quantify either single-criterion or multi-criteria volcano analogy, which is interpreted as the inverse of a distance metric defined between any two volcanic systems in the databases. This distance metric is a proxy for dissimilarity and is calculated using normalised numerical variables, which provides a quantification of the volcano analogy between zero and one. This generates a continuum of possible analogue volcanoes and gives the method considerable flexibility in scope and application. First, an automated quantitative procedure to identify sets of analogue volcanoes offers an accountable and reproducible means to use these sets to support improved hazard assessments worldwide. Second, objective provision of analogue sets can also contribute new insight into the study of fundamental magmatic or physical volcanic processes where that understanding is built on the observation of commonalities across individual volcanic centres. Third, data-derived analogue volcanoes can become a powerful complement to sets of analogue volcanoes obtained through expert knowledge (e.g. Marzocchi et al. 2004; Sandri et al. 2012, 2014; Newhall and Pallister 2015). Nevertheless, it is crucial to note that a careful use of those sets of automatically-derived analogue volcanoes is necessary. The user needs to have a clear understanding of how the sets are generated and the significance and relevance of the analogue volcanoes obtained has to be assessed in the specific context of each analysis.

In the following, the source datasets and methods used to define volcano analogy are described. Then, VOLCANS is demonstrated through results for three volcanoes that (1) have characteristics that span across the different criteria (e.g. intraplate/subduction zone, basalt/andesite/dacite, shield volcano/stratovolcano); (2) have been studied to different levels of detail (i.e. some are data-rich, others are not); and (3) have recently experienced major phases of volcanic activity. The three volcanoes are as follows: Kı̄lauea, USA (intense phase of rifting and lava effusion from April to August 2018, approximately; Global Volcanism Program 2013, Neal et al. 2019); Fuego, Guatemala (significant explosion and devastating PDCs on 3 June 2018, Naismith et al. 2019); and Sinabung, Indonesia (long-term eruption from 2013 to 2018, and new phase starting in May 2019, with numerous PDCs and some large explosions; Global Volcanism Program 2013; Gunawan et al. 2019). Finally, the main applications and limitations of VOLCANS, including potential issues with the source datasets, are discussed.

Source datasets

The three global volcanological databases used are as follows: (1) the Holocene Global Volcanism Program database (Global Volcanism Program 2013, version 4.6.7), hereinafter GVP; (2) the volcano morphology database of Pike and Clow (1981), hereinafter Pike81; and (3) the volcano morphology database presented by Grosse et al. (2014), hereinafter Grosse14. For the Pike81 and Grosse14 databases, only the data corresponding with volcanoes with a unique identifier matching onto the GVP database are used. Moreover, the two morphological databases are merged into a unique dataset, by assessing the correspondence and exchangeability between the variables found in each of the databases (see Online Resource 1 for more details). CSV files containing all datasets used in the analysis can be found in Online Resources 2, 3, 4 and 5.

Global Volcanism Program database

The GVP database is the global reference for the list of Holocene and Pleistocene volcanic systems in the world, but in this work, only the Holocene list is used. The GVP stores information for any volcanic system with known or suspected eruptive activity during the last 12 kyr, approximately. The data are publicly available and can be downloaded in XML format via four different searches: volcano, eruption, deformation and emission. Data from the volcano and eruption searches are used in this study. This includes information about volcano name, type, tectonic setting, major and minor rock types, VEI sizes of eruptions and eruptive phenomena. (see Online Resources 2, 3, 4 and 5).

Pike and Clow (1981) database

This database contains the topographic dimensions of terrestrial and some planetary volcanic systems, in particular the height, flank width, and diameter, depth and circularity of the summit depression, which sometimes corresponds to a crater and others to a caldera. The database has information for volcanoes of any type (Grosse14 does not have calderas), but the measurements were performed manually from diverse maps or aerial photographs. Therefore, if a volcano in the Pike81 database has an equivalence available in the Grosse14 database, the latter source, which is more accurate and less subjective, is used.

Grosse et al. (2014) database

The Grosse14 database contains different morphological variables relating to the edifice (height, width, average slope, etc.) and, in some cases, to the summit depression (depth, width, ellipticity, etc), derived using a common topographic source (Digital Elevation Models derived from the Shuttle Radar Topography Mission, with spatial resolution of 90 m) and with developed algorithms that automatically define parameters such as the lateral extent of the volcanic edifice (Grosse et al. 2012; Euillades et al. 2013). Only positive-relief volcanoes are included in the database (predominantly shields and stratovolcanoes). Hence, caldera systems are retrieved from the Pike81 database.

Methods

The goal of VOLCANS is a quantification of single-criterion and multi-criteria analogy between volcanic systems around the world by means of five criteria, for which data for the majority of volcanoes are available in the aforementioned databases (Figs. 1 and 2): (1) tectonic setting, (2) rock geochemistry, (3) volcano morphology, (4) eruption size and (5) eruption style. Analogy is understood to be a measure of similarity and, thus, the more similar two volcanic systems, the more analogous they are. To measure this similarity, a distance metric, which ranges from zero to one, is defined for each analogy criterion (Table 2). The inverse of the distance metric represents the measure of single-criterion analogy between any two volcanoes, and it also ranges from zero to one. If the analogy is equal to zero, there is no analogy between the volcanoes. If the analogy is one, the volcanoes are interpreted to be perfect analogues, for that particular criterion. If a volcano X has a higher value of analogy with volcano Y than with volcano Z, the interpretation is that volcano Y is a better analogue of volcano X than volcano Z.

Table 2 Framework to define and determine single-criterion analogy metrics for five different volcanological criteria. “D” stands for the absolute distance between any two volcanoes (X, Y) within a given criterion and it is measured in slightly different ways for the different criteria (see also Figs. 1 and 2). “A” stands for single-criterion analogy and it is the inverse of the distance “D”. Multi-criteria volcano analogy is defined as a weighted sum of single-criterion analogies (see Eq. (1) and text for more details). TAS, total alkali silica; QAPF, Quartz-Alkali feldspar-Plagioclase-Feldspathoid; VEI, Volcano Explosivity Index; ECDF, empirical cumulative distribution function (letter “F” in the distance-metric column is equivalent to ECDF); Ts, tectonic setting; G, rock geochemistry; M, volcano morphology; Sz, eruption size; St, eruption style; h_i, frequency of eruptions with a given hazardous phenomena i; H, total number of groups of hazardous phenomena (see Table 4)

Full size table

In order to define each distance metric, categorical variables (e.g. tectonic setting) are converted into numerical variables by applying sub-criteria which are deliberately independent of other sub-criteria and criteria. In other words, the definition of these numerical variables, and therefore the definition of analogy for each criterion, is fully independent of any other criteria (Table 2). For instance, magma geochemistry is not considered when defining the analogy in morphology (Pike 1978; Pike and Clow 1981). Only edifice dimensions are used to define such single-criterion analogy (Fig. 2). For numerical variables associated with a distribution instead of a single value (e.g. VEI class), the distance metric is the area between the empirical cumulative distribution functions (ECDFs) of any two volcanoes (Fig. 1, Table 2). Finally, for eruption style, the distance metric is calculated as the normalised sum of differences in the unit proportion of eruptions that generated diverse hazardous processes (e.g. lava flows and tephra fallout) at each volcano (Fig. 1, Table 2).

To quantify the overall analogy among any two volcanoes, X and Y, single-criterion analogies are combined into a multi-criteria analogy metric (A_XY) using the following formula:

$$ {\mathbf{A}}_{\boldsymbol{XY}}={w}_{Ts}\cdotp {ATs}_{XY}+{w}_G\cdotp {AG}_{XY}+{w}_M\cdotp {AM}_{XY}+{w}_{Sz}\cdotp {ASz}_{XY}+{w}_{St}\cdotp {ASt}_{XY} $$

(1)

where w_i are the weights given to each single-criterion analogies (Ts: tectonic setting, G: rock geochemistry, M: volcano morphology, Sz: eruption size, St: eruption style; see Table 2). Note that some weights can be set to zero, in which case the corresponding single-criterion analogy is not considered to calculate the multi-criteria analogy. Provided that the sum of weights equals one, the multi-criteria analogy metric is a weighted average of the single-criterion analogies between volcanoes X and Y. If a given volcano has no data available for a given criterion, the single-criterion analogy between this volcano and any other volcano in the database is set to zero. This is a way to avoid calculating spurious volcano analogies: even if the analogy could exist, it is not possible to corroborate this without any data available.

Analogy in tectonic setting

Given the small number of pseudo-quantitative categories used to describe tectonic setting in the GVP database, two simplified sub-criteria are used to create a variable for tectonic setting, Ts (Fig. 1a and Table 2). These criteria are linked to (a) the primary melting mechanism of the mantle (i.e. chemically versus decompression-driven); and (b) the easiness of the path that the magma must traverse before it reaches the Earth’s surface (Pearce 1996; Perfit and Davidson 2000). That is, it is assumed that the shallower the melting depth and the thinner the lithosphere in/above the melting zone, the easier the path towards eruption (Fig. 1a). Consequently, low values of the variable Ts correspond with decompression-driven (rifting) magmatism over more or less thin crust while high values of the Ts variable correspond with subduction-zone magmatism over more or less thick crust.

Analogy in rock geochemistry

Simplified versions of the two main geochemical/compositional diagrams to classify volcanic rocks are used to define a variable for rock geochemistry, G (Total Alkali Silica, TAS, and Quartz, Alkali feldspar, Plagioclase, Feldspathoid, QAPF diagrams; Le Maitre et al. 2005; Fig. 1b). Hereinafter, the rock-type abbreviations used are the same as in the GVP database (Siebert et al. 2010; GVP 2013). According to the position in the diagrams of a given rock type compared with another rock type, a value of distance is assigned to the pair of rock types as follows: one if the rock types are adjacent; two if they are separated by one rock type, or three if they are separated by two different rock types. The sum of the values from each diagram and for each pair of rock types ij is expressed as r_ij. It is a proxy for the dissimilarity between each pair of rock types: the higher the number, the more dissimilar the rocks. The numerical variable for rock geochemistry (G) is defined to have as extreme points: (1) the most dissimilar rock in comparison with all the other rocks (i.e. the one with the highest sum of r_ij, over all j rock types): rhyolite, R; and (2) the rock type that is the most dissimilar compared with rhyolite: i.e. foidite, F. Among the other eight rock types, the ones with the lowest value of r_Rj (dacite, D, and trachyte, T) and r_Fj (tephrite/basanite/trachybasalt, X, phono-tephrite/tephri-phonolite, Z, and phonolite, P) are constrained to be the closest rock types to rhyolite and foidite, respectively. Then, all possible arrangements of rock types (around 4500 possible combinations) are explored to define the variable for rock geochemistry. The combination that minimises the total sum of r_ij values among all the neighbouring rock types in the variable is the one selected. The ordering of the rock types is: F-P-T-Y-Z-X-B-A-D-R (Fig. 1b), which reflects differences in SiO₂, total alkali (NaO₂ + K₂O) and mineral contents between the rock types. Indeed, the scale divides the subalkaline/tholeiitic (high values of the variable G) and alkaline rock series (low values of the variable G). That is, B-A-D-R are closer to each other in comparison to the other, more alkali-rich rock types, as reported by Le Bas et al. (1986). However, it must be noted that the geochemical data for many volcanoes in the GVP database may come from general rock definitions (e.g. basalt or andesite) that do not differentiate between subalkaline and alkaline rock suites (Siebert et al. 2010).

In order to account for intra-volcano geochemical variability, all major and minor rock types in the GVP database are considered, for any given volcano. Each major rock type is counted as two data points and each minor rock type as one data point to create normalised histograms and ECDFs of rock types, one for each volcano. The measure of single-criterion analogy for rock geochemistry between volcanoes X, Y, is the inverse of the area between their ECDFs (Fig. 1b). In other words, it is a measure of the overlap between the histograms of rock types for volcanoes X, Y (Fig. 1b, Table 2).

Analogy in volcano morphology

Volcano morphology is derived based on (Fig. 2) crater/caldera diameter (d), height of rim crest above pre-volcano topography (H), ratio between the height and half-width of the volcanic edifice (H/W*) and truncation parameter of the volcanic edifice (T~d/2W*). The values of all of these variables have been checked to be consistent and compatible across the two volcano morphology databases (see Online Resource 1).

Based on its ECDF, each variable is divided into 10-percentiles-wide classes or ranks. Each rank is assigned a value from 1 (i.e. 0–10th percentiles) to 10 (i.e. 90–100th percentiles). In Fig. 2b, the values of each rank for a given variable are plotted against the mean value of the rank for another variable. This shows the general underlying trends between the different variables, without these trends being obscured by the dense continuum of morphologies that exist in the world’s volcanoes (Grosse et al. 2014). Small values of d and T tend to occur together with high values of H and H/W* (Fig. 2b). This corresponds well with the view of high volcanic edifices (e.g. stratovolcanoes) having high height-to-width ratios, small craters and relatively low truncation values (Grosse et al. 2009, 2014). On the contrary, low-altitude edifices (e.g. shields and calderas) are commonly conceived to have low height-to-width ratios, large craters/calderas and sometimes high truncation (Pike 1978; Pike and Clow 1981). The merged dataset used here confirms these trends. Therefore, the morphology of a volcano is described by variable M:

$$ M=d+T\hbox{--} \left(H+H/W\ast \right) $$

(2)

where d, T, H, H/W* correspond to the aforementioned rank values (from 1 to 10), for any given volcano. The M variable separates volcanoes that have low-height, gentle-slope, highly-truncated edifices with large craters (high values of M) from volcanoes that have high, steep-slope, low-truncation edifices with small craters (low values of M). All entries in the merged morphology database have an entry for the variables T, H and H/W*. However, some of the stratovolcanoes taken from the Grosse14 database lack a measure of the variable d, most commonly linked with very small craters (Grosse et al. 2014). In such cases, a rank value of zero is used.

Figure 2c shows the distributions of M values obtained when grouping volcanic systems according to their Primary Volcano Type defined in the GVP database. The distributions of M for simple and complex/compound stratovolcanoes, shield volcanoes and caldera systems are significantly different (Fig. 2c). Table 3 reports the mean values of M for all these main groups. Given the consistent separation of volcano types in different populations of the M variable (Fig. 2c), mean values of M are assigned to all the GVP entries that are not described in the merged morphological database but that are classified as stratovolcanoes, shields or calderas by the GVP database according to their Primary Volcano Type. In the case of volcanic fields, a (normalised) value of M = 1 is assigned, this being purely based on their general morphology, which features large lateral extension and very limited vertical extent.

Table 3 Summary of the minimum, maximum and mean values of the morphology variable (M) for simple, complex and all stratovolcanoes, shield volcanoes and caldera systems, all according to the assigment of “Primary Volcano Type” stored in the GVP database

Full size table

Analogy in eruption size

For volcanoes with an eruption record in the GVP database, the volcano analogy in terms of eruption size and eruption style is assessed. The approach of merging the whole recorded dataset of eruption sizes and hazardous phenomena for any given volcano assumes stationarity in activity, at least when averaged over the Holocene. Nonetheless, it is acknowledged that analyses of the eruptive histories at individual volcanoes over shorter periods of time commonly evidence non-stationary behaviours (e.g. Bebbington 2008; Marzocchi and Bebbington 2012; Connor et al. 2015).

Given that VEI 2 is the default assignment in the GVP database and the fact that effusive (VEI 0) and very small explosive eruptions (VEI 1–2) are particularly under-reported (De la Cruz-Reyna 1991; Mead and Magill 2014), all eruptions with VEI ≤ 2 are grouped together (Mead and Magill 2014). Under-recording issues are expected to affect any analysis of frequency-size of eruptions, independently of whether VEI or eruption magnitude data are used (Rougier et al. 2016). A simplified approach to compensate for under-recording is implemented: (1) a suitable function to model the probability of recording an eruption of a given VEI x, at time t, is identified; and (2) this probability is used to extrapolate the number of eruptions of that VEI that might have been missed at the specific volcano. Thus, the total number of eruptions of a given VEI (x) that have occurred at a given volcanic system is estimated as:

$$ {E}_R(x)={E}_{R1}(x)+{E}_{R2}(x) $$

(3)

where E_R₁(x) denotes the total number of eruptions of VEI x that occurred before the date of completeness (k, considered as the date after which the recording probability is one; Mead and Magill 2014) and E_R₂(x) denotes the total number of eruptions of VEI x that occurred after the date of completeness k. For each volcano and eruption size, if E_R(x) is not an integer, its value is rounded to the closest integer. E_R2(x) is calculated directly from summing up the number of recorded eruptions in the GVP database, for a given volcano after time k, given that the recording probability is assumed to be one. E_R1(x) is calculated as follows:

$$ {E}_{R1}(x)=\frac{\sum \limits_{t={t}_1}^k{E}_{r1}\left(t,x\right)}{p\left(t,x\right)} $$

(4)

where p (t, x) is the recording probability of an eruption of VEI x that occurs at time t ≤ k and E_r1 (t, x) denotes the indicator function for any eruption of VEI x that occurred at time t ≤ k. The sum of E_r1 (t, x) is the total number of eruptions of size x recorded before time k.

A single-change-point presence function is used to model the recording probability (Furlan 2010; Mead and Magill 2014):

$$ p\left(t,x\right)=\left\{\begin{array}{c}\frac{1}{1+{e}^{-a-\beta x}}\\ {}1\end{array}\kern0.5em \begin{array}{c}\kern0.5em t\le k\\ {}t>k\end{array}\right\} $$

(5)

where α, β are the parameters controlling the scale and shape of the function. The date of completeness k is taken from Table 1 in Mead and Magill (2014), in particular the median value of the change point posterior distribution in their model, for any volcano in a given country. In the absence of data at the country scale, the k value available for the corresponding region is used.

The α and β parameters are selected following two naive assumptions. Before the date of completeness: (a) VEI = 0 eruptions are “exceptionally unlikely” (Mastrandrea et al. 2010) to be recorded, i.e. p(t, 0) = 0.01; and (b) VEI = 8 eruptions are “virtually certain” (Mastrandrea et al. 2010) to be recorded, i.e. p(t, 8) = 0.99. It is noted that the latter may actually be an overestimation given that around 70% of the world’s Holocene volcanoes are very poorly studied (Loughlin et al. 2015). These two assumptions provide a parameterisation for the presence function in Eq. (5): α = − 4.595, β = 1.150. Using this parameterisation, a VEI 4 eruption, for instance, has a 50% probability of being recorded, before the date of completeness k.

The choice of a single-change-point function can be questionable at global scales (Deligne et al. 2010; Rougier et al. 2016) but may be adequate for many countries and regions, especially for small-size eruptions (Jenkins et al. 2012) and/or for those areas where geological data are scarce compared to historical data (Mead and Magill 2014). In our study, this choice is a convenient one, given that all the eruptions that occurred at time t ≤ k can then be cumulated to estimate the total number of eruptions that might have happened over that time span. This partially relaxes the issues of under-recording in the GVP database.

Normalised VEI sizes are used as the variable for eruption size and the value of E_R(x) at each volcano is used to build a histogram, and an ECDF, which estimates the frequency-magnitude distribution at each specific volcano (given eruption). The area between the ECDFs of any two volcanoes, X and Y, is used as the distance metric for the eruption-size criterion. The associated single-criterion analogy is the inverse of this distance metric (Fig. 1c and Table 2).

Analogy in eruption style

In order to estimate single-criterion analogy in eruption style, data on hazardous phenomena are used. A total number of 22 Event types from the GVP database are grouped into eight groups of hazardous processes (Fig. 1d, Table 4, Online Resource 1). At each particular volcano, the total number of occurrences of each group is divided by the total number of eruptions with eruption-style data, to calculate the proportion of eruptions that generated each group of hazardous phenomena. To avoid duplications, two or more events of the same group occurring during the same eruption are counted as only one occurrence of the group. The normalised sum of differences between the proportions of the different hazardous processes is used as the distance metric for the eruption-style analogy between volcanoes X, Y (DSt_XY, Fig. 1b and Table 2):

Table 4 List of the event types used from the GVP database and their assigned correspondence in terms of 8 groups of physical hazardous processes

Full size table

$$ {DSt}_{XY}=\frac{\sum \limits_{i=1}^N\left|{h}_i^X-{h}_i^Y\right|}{H} $$

(6)

where H is the total number of groups of hazardous phenomena, and h_i^X and h_i^Y are the frequencies of occurrence (or proportions) for the ith group and volcanoes X and Y, respectively. Thus, the analogy in eruption style can be interpreted as an average difference between the proportions of the different hazardous phenomena. It is also noted that Eq. (6) penalises both differences in the frequency of occurrence and data scarcity in the groups of hazardous phenomena, for if there is no data for a given group, h_i will be equal to zero.

Results

The content and functionality of VOLCANS is illustrated by analysing different sets of analogues for Kı̄lauea (USA), Fuego (Guatemala) and Sinabung (Indonesia). Figure 3 contains a summary of the ID profile of each volcano, that is, the data available for each analogy criterion. Single-criterion and multi-criteria analogies are described and top analogue volcanoes (e.g. top 10 or top 20) are identified. In the case of multi-criteria searches, three different weighting schemes are explored: (A) equal weighting, i.e. all criteria count the same to calculate the volcano analogy, (B) eruption size and style are the only criteria used (equal weight between them) and (C) rock geochemistry and volcano morphology are the only criteria used (equal weight between them). These weighting schemes, which explore the most-used criteria to search for analogue volcanoes (Table 1), are purely illustrative. The particular choice of weighting scheme, which obviously alters the set of analogue volcanoes obtained, will depend on the specific goals and needs of each user. The significant advantage of VOLCANS is that it provides the user with full flexibility about this choice. Results revealed that some criteria were non-differentiating while others were differentiating. Non-differentiating criteria are defined as those that, for a specific volcano, result in tens or hundreds of analogue volcanoes with the same value of analogy (including one). Such criteria cannot be used to identify reduced sets of analogue volcanoes. Differentiating criteria, on the other hand, result in much fewer analogue volcanoes with the same value of analogy.

Kı̄lauea, USA

Kı̄lauea is a basaltic shield volcano located on an oceanic intraplate tectonic setting in relation to the presence of the Hawaiian hot spot (Decker et al. 1987; Carey et al. 2015). It formed on the eastern flank of the large Mauna Loa shield volcano at least 350 ka (Quane et al. 2000). Over the past 200 yr, its eruptive products have mainly consisted of lava flows but tephra layers have been also identified in the Pleistocene and Holocene (Easton 1987; Fiske et al. 2009). In historical times, phreatomagmatic and phreatic explosive eruptions have occurred (McPhie et al. 1990; Dvorak 1992; Mastin et al. 2004). Over a longer timescale of the past 2500 yr, Swanson et al. (2014) identified several shifts between periods dominated by either effusive or explosive volcanic activity, each of those periods lasting for several centuries. Although erupted magma volumes were significantly higher during effusive periods, the total duration of explosive periods was calculated as about 500 yr longer (Swanson et al. 2014).

Single-criterion analogies for tectonic setting, rock geochemistry and volcano morphology give rise to many perfect analogues to Kı̄lauea (i.e. analogy equal to one), given its relatively simple ID profile for those criteria (Fig. 3). In terms of eruption size, a large number of volcanoes share a high value of single-criterion analogy with Kı̄lauea (ASz = 0.9989). Only one volcano (Piton de la Fournaise) has a slightly higher value of eruption-size analogy (ASz = 0.9992). This is related to the fact that, similarly to Kı̄lauea, over 99% of the eruptions at Piton de la Fournaise have been VEI ≤ 2, but there is also one large explosive event, a VEI 5 eruption (the Bellecombe Ash Member eruption, Global Volcanism Program 2013; although the eruption is contested by Ort et al. 2016, to have been actually three eruptions, adding up to volumes of around 0.3 km³—VEI 4—and column heights of 8 km, at maximum). At Kı̄lauea, the largest explosive eruption in its eruptive record is a VEI 4 eruption (Keanakāko‘i ash, 1790 AD; which is interpreted to have been a long-lasting series of events instead of a single eruption, McPhie et al. 1990; Mastin et al. 2004; Swanson et al. 2014; Swanson and Houghton 2018). In terms of eruption style, it is possible to identify the top 10 analogue volcanoes to Kı̄lauea (Table 5). These are predominantly basaltic to dacitic volcanoes that, on average, are characterised by producing lava flows and/or fountaining in almost every eruption (> 92% of them), ballistics and tephra in 1 out of 3 eruptions, and phreatic/phreatomagmatic activity only occasionally (about 2% of the eruptions).

Table 5 Top 10 analogue volcanoes to Kı̄lauea, Fuego and Sinabung volcanoes, according to different single-criterion analogy metrics and three different weighting schemes to assess multi-criteria analogy. ISO Alpha-2 country codes for each volcano are indicated between round brackets. Number of eruptions with VEI assigned that are available in the GVP database are reported between square brackets for each analogue volcano in scheme B

Full size table

Multi-criteria searches of analogue volcanoes to Kı̄lauea with scheme A (equal weight) bring a mixture of analogue volcanoes (Table 5): some of them can be linked to top 10 analogues in eruption size and style but others, like the top 1 and 3 analogue volcanoes (Mauna Loa and Karthala, respectively), are not identified when using single-criterion searches. The top 10 list for scheme B (eruption size and style only) is dominated by the top 10 eruption-style analogues (Table 5), but the use of both eruption size and style provides new candidates that do not arise in the single-criterion search: e.g. Mauna Loa and Hualālai. Finally, the top 10 list of analogue volcanoes for scheme C (rock geochemistry and volcano morphology only) has a few examples from lists A and B but, generally, contains volcanoes that are not listed in the other searches (Table 5). This suggests that these predominantly basaltic volcanoes, with morphologies similar to Kı̄lauea, may not necessarily show eruption sizes and styles closely matching those of Kı̄lauea volcano.

The ratios between different single-criterion analogies for the top 20 analogue volcanoes identified from different weighting schemes are shown in Fig. 4. It is noted that the fact that a weighting scheme uses two analogy criteria only (e.g. eruption size and style for scheme B) does not imply that the identified top 20 analogues do not have data for the other analogy criteria (e.g. rock geochemistry or volcano morphology). Analogy in morphology and geochemistry are systematically higher than analogy in either eruption size or style, for the analogue volcanoes obtained with scheme C. Likewise, the analogue volcanoes derived from scheme B have systematically higher values of analogy in eruption size and style compared with analogy in morphology and geochemistry (Fig. 4b, c). In other words, analogy in eruption size and style and analogy in geochemistry and morphology for Kı̄lauea volcano are decoupled: volcanoes with rock geochemistries and/or morphologies the most similar to Kı̄lauea (top 20, scheme C) do not necessarily show the same highest degree of similarity in terms of eruption size and/or style (top 20, scheme B). Otherwise, the values of the ratios between analogy criteria would be very similar. Therefore, the degree of decoupling between different single criteria for each volcano can be assessed as the dispersion or distance between data points for different weighting schemes.

The spatial distribution of the top 20 analogue volcanoes to Kı̄lauea, according to the three weighting schemes is displayed in Fig. 5a–c. The vast majority of analogues from schemes A (20/20) and B (13/20) are located either on intraplate settings, such as oceanic islands (e.g. Galapagos, Comoros) or on rift zones (e.g. Iceland, East African Rift). This spatial clustering of volcanoes away from subduction zones is a non-random pattern. Approximately 70% of the volcanoes in the GVP database are located on subduction zones. Therefore, if 20 volcanoes were sampled at random, 14 volcanoes would be located, on average, on subduction zones. This number is significantly above the number of analogue volcanoes obtained for Kı̄lauea in schemes A and B. In scheme C, the analogue volcanoes are more dispersed spatially and they can also occur on subduction zones (Fig. 5c). These volcanoes are purely basaltic stratovolcanoes on the western rim of the Pacific plate and whose values of M are slightly above the mean value for complex stratovolcanoes and therefore closer to the mean value of shield volcanoes (Table 3). Kı̄lauea is a below-average shield volcano in terms of morphology (M = 0.447; Table 3). Therefore, a greater proportion of stratovolcanoes will have M values similar to Kı̄lauea, compared with other shield volcanoes with higher M values (see Fig. 2c).

Fuego, Guatemala

Fuego is a basaltic-andesitic stratovolcano related to the subduction of the Cocos plate under the Caribbean plate and forms part of a lineament of volcanic centres that includes the mainly Pleistocene Meseta volcano and Fuego’s twin volcano: Acatenango (Rose et al. 1978; Chesner and Rose 1984; Chesner and Halsor 1997). Volcanism has migrated spatially, from north to south, and chemically, from andesitic to basaltic compositions, with time (Chesner and Rose 1984). Historical volcanic activity has been mostly sourced from Fuego and has predominantly consisted of open-conduit, persistent, and frequent explosive activity (from Strombolian to Vulcanian) punctuated by larger eruptions up to sub-Plinian (Rose et al. 1978; Martin and Rose 1981; Lyons et al. 2010; Naismith et al. 2019).

Tectonic setting is a non-differentiating criterion for a subduction-zone volcano like Fuego. All the other single criteria can be used to identify a set of top 10 analogue volcanoes to Fuego (Table 5). Those analogues according to volcano morphology are perfect analogues because Fuego is one of the 11 volcanoes in the morphological database with a value of M = 0. These volcanoes have very high values of H and H/W* and, at the same time, very small values of T and d. The top 10 eruption-style analogue volcanoes (Table 5) produce ballistics and tephra, and lava flows and/or fountaining on 98% and over 30% of their eruptions, on average. Water-sediment flows and PDCs are generated, respectively, in 11% and 13% of their eruptions, on average. Finally, the average percentage of eruptions with collapse/destruction of the edifice or with caldera formation is 0.6% and 0.3%, respectively.

The top 10 analogue volcanoes according to scheme A tend to be related to volcanoes with high values in analogy morphology or eruption style (Table 5). In scheme B, the top 10 analogue volcanoes tend to appear in the top 10 list of eruption style analogues as well. However, their analogy in eruption size is also high, because their ASz/ASt ratio is very close to 1 (Fig. 4d). The values of multi-criteria analogy in scheme C seem to be closely linked with high values of analogy in morphology while none of the rock-geochemistry top 10 analogue volcanoes appears in the lists from scheme A, B or C (Table 5). Nevertheless, the analogy in geochemistry seems to be better coupled with the analogy in eruption size and style as it is shown by the analogue volcanoes for scheme B in Fig. 4e, f. The large scatter in the values of the ratios that involve the analogy in morphology suggests that volcanoes that have eruption sizes and styles similar to Fuego do not have particularly similar morphologies. Somewhat equivalently, the analogue volcanoes for scheme C show a greater scatter along the AM/ASt ratio (y-axis, Fig. 4e) than along the AG/ASt ratio (x-axis, Fig. 4f): that is, volcanoes with morphologies similar to Fuego have more varied eruption styles, compared with Fuego than volcanoes with rock geochemistries similar to Fuego. Three volcanoes are listed in two different schemes: Tacaná and Merbabu (schemes A, C; the latter is not top 10 in any single-criterion search) and Pavlof (schemes A, B); and two volcanoes are listed in all three weighting schemes: Semeru and Klyuchevskoy.

Spatially, the analogue volcanoes to Fuego are located on subduction zones, independently of whether tectonic setting is used (scheme A) or not (schemes B and C) as a search criterion (Fig. 5d–f). These spatial distributions are also non-random because the number of volcanoes on subduction zones (from 18 to 20) is significantly higher that what could be expected, on average, if the locations of 20 volcanoes were randomly sampled from the GVP database. The analogue volcanoes in schemes A and B are relatively evenly distributed along the western coast of the American continents, the western edge of the Pacific plate (especially in scheme B) and Indonesia. In scheme C, half of the top 10 analogue volcanoes are located in Kamchatka, Russia.

Sinabung, Indonesia

Sinabung is an andesitic-to-dacitic stratovolcano related to the oblique subduction of the Indo-Australian plate under the Eurasian plate along the island of Sumatra (Diament et al. 1992), and it is located less than 40 km away from the north-western edge of Lake Toba caldera. Volcanic activity at Sinabung was relatively unknown, and had been absent for the last 400 years, before its first historical eruption occurred in 2010 (Gunawan et al. 2019). After 3 years of repose, the volcano again erupted in 2013, this time evolving into a long-term (~ 5 years) eruption, which included initial phreatic and phreatomagmatic phases, several periods of lava-dome growth and collapse, andesitic lava flows and (cyclic) Vulcanian explosions (Gunawan et al. 2019; Nakada et al. 2019; Pallister et al. 2019).

In general, single-criterion searches are extremely uninformative for Sinabung because it has an ID profile (Fig. 3) that is very similar to many other volcanoes in the databases, when looking at single-criterion analogy only. For instance, there are hundreds of volcanoes on a subduction zone under continental crust and many tens of volcanoes with rock geochemistry evenly distributed between andesite and dacite. The only single criterion that provides a list of top 10 analogue volcanoes to Sinabung is eruption style (Table 5).

Multi-criteria searches bring much more information because the number of volcanoes that share the very same characteristics to the target volcano, for several single criteria, is much lower. The analogue volcanoes according to scheme A arise from a combination of all the criteria and they are all different from the list of analogue volcanoes obtained from the analogy in eruption style. They display a relatively clustered pattern (with some outliers) around the (1,1) point according to different ratios between single criteria (Fig. 4g–i), which suggests that the values of all single-criterion analogies are comparable. These top 10 scheme-A analogue volcanoes (Table 5) show the following characteristics: (i) they all are on subduction zones under continental crust; (ii) their expected geochemistry is predominantly andesitic to dacitic; (iii) they have morphologies significantly below the average morphology of stratovolcanoes (mean value on M = 0.172); (iv) their averaged distribution of eruption sizes is dominated by VEI ≤ 2 eruptions but VEI 3+ eruptions have non-negligible probabilities of occurrence (8%, in particular); and (v) their averaged hazardous phenomenology indicates that relatively few eruptions would produce lava flows and/or fountaining (4%), water-sediment flows (2%) or PDCs (9%). The latter is in stark contrast with the eruption-style profile of Sinabung, especially for PDCs, which have been reported in 67% of the eruptions. Nevertheless, the eruption data for Sinabung is strongly deficient, with only two eruptions with a VEI size assigned and three eruptions with hazardous phenomena recorded in the GVP database (and no reference to phreatic/phreatomagmatic activity or lava flows for the 2013–2018 eruption; GVP 2013). This potential issue is tackled in the next section.

The set of top 10 analogue volcanoes from scheme B includes some of the volcanoes identified with the single-criterion search of analogy in eruption style. Still, the majority of multi-criteria analogue volcanoes are not listed in the single-criterion search (Table 5). The analogue volcanoes from scheme C are also different from the single-criterion analogues, apart from the case of Atacazo (Table 5). This volcano is a perfect analogue according to tectonic setting, rock geochemistry and morphology and a top 10 analogue in terms of eruption style.

In terms of the spatial distribution, the analogue volcanoes to Sinabung are always located on subduction zones, independently of whether tectonic setting is used as analogy criterion or not. As in the case of Fuego volcano, this represents a non-random spatial distribution. These analogues are distributed similarly in schemes A and C (Fig. 5g, i), with the western coast of South America and Indonesia dominating over other areas (10 out of 17 different analogue volcanoes are located there). Among them, three volcanoes appear as top 10 analogue volcanoes according to both scheme A and C: San José, San Pedro-San Pablo and Tandikat-Singgalang. In scheme B (Fig. 5h), the analogue volcanoes to Sinabung appear to be more scattered and on different areas compared to schemes A and C. Some of these analogue volcanoes may not be fully representative because of the issues with the eruption data for Sinabung explained before.

Interestingly, the distribution of VEI sizes for Kı̄lauea (for which tens of eruptions are recorded in the GVP database) is almost identical to the (incomplete) distribution for Sinabung (Fig. 3c). Therefore, the differences observed in the sets of analogue volcanoes to Kı̄lauea and Sinabung using scheme B (Fig. 5b, h) can only be explained in terms of eruption style. Even though the eruption-style data for Sinabung are also incomplete, the hazardous phenomenology recorded at the two volcanoes is remarkably disparate (Fig. 3d). Thus, Kı̄lauea shows the typical profile of a lava-flow-producing volcano while Sinabung can be ascribed to the profile of a tephra-and-PDC-producing volcano. Accordingly, the analogue volcanoes to Kı̄lauea and Sinabung derived from scheme B are still fundamentally different in terms of tectonic setting (Fig. 5b, h), rock geochemistry and even morphology (e.g. mean(M) = 0.541 for the Kı̄lauea analogues and mean(M) = 0.298 for the Sinabung analogues).

Discussion

Volcanic hazard assessment

VOLCANS represents a new objective method for identifying sets of analogue volcanoes, for a given volcano of interest, using global volcanological databases. This type of automated and flexible procedure to extract sets of analogue volcanoes has the potential to become a fundamental tool used to inform volcanic hazard assessments, during quiescent, unrest and crisis phases. Existing methods and tools such as (Bayesian) event trees (Newhall and Hoblitt 2002; Marzocchi et al. 2008, 2010; Newhall and Pallister 2015), Bayesian Belief Networks (Aspinall et al. 2002; Hincks et al. 2014; Tierz et al. 2017) or hierarchical Bayesian modelling (Ogburn et al. 2016) would extremely benefit from deriving its prior distributions from objective sets of analogue volcanoes (Sheldrake 2014; Biass et al. 2016). Similarly, alternative parameterisations of the hazard models could be built from different sets of analogue volcanoes and this could be used to quantify the epistemic uncertainty and/or for testing of hazard models (e.g. Marzocchi and Jordan 2014; Spiller et al. 2014; Tierz et al. 2016a, 2016b). The presented method can also be used to enlarge the datasets used in the hazard assessment. Up to now, the common practice has been to use data from a limited set of best analogue volcanoes to assess volcanic hazard (e.g. Marzocchi et al. 2004; Sandri et al. 2012; Tierz et al. 2016a; Wright et al. 2019). With VOLCANS, the hazard analyst could expand the dataset to any available volcano and weigh the data according to the multi-criteria analogy calculated between these volcanoes and the target volcano. In this way, data from good analogues would count more than data from other volcanoes. This would dramatically increase the amount of data available for the hazard assessment, hence, improving the robustness of the statistics computed (e.g. Newhall and Hoblitt 2002; Marzocchi et al. 2008, 2010; Ogburn et al. 2016). Still, if selecting the best analogue volcanoes is the preferred strategy for the hazard analysis, VOLCANS can also help define such a set: e.g. top 10, top 20, volcanoes above a given threshold of volcano analogy. Volcano analogy can be computed using any possible multi-criteria weighting scheme, depending on the particular needs and requirements of the user of the method. This flexibility helps with adaptation and application to different situations and volcanological problems. Nonetheless, there will always be the need to assess the significance of the analogue sets found independently (Newhall et al. 2017) as well as to supplement the results with relevant expert scientific knowledge not available in the databases (Aspinall et al. 2002, 2003; Selva et al. 2012; Hincks et al. 2014; Newhall and Pallister 2015).

A tool to complement expert-derived analogue volcanoes

The hazard analyst/team may also be interested in assessing the general appropriateness of sets of analogue volcanoes that were selected, a priori, during previous crises and/or hazard evaluations (e.g. Marzocchi et al. 2004; Sandri et al. 2012; Newhall and Pallister 2015). Here, this is exemplified by looking at the values of multi-criteria analogy calculated for volcanoes that have been taken as analogues to the three example volcanoes by previous published and unpublished studies. In particular: Piton de la Fournaise (France) and Etna (Italy) for Kı̄lauea (Peltier et al. 2015; Poland et al. 2017); Villarrica, Llaima (Chile), Pacaya (Guatemala), Reventador and Tungurahua (Ecuador) for Fuego (Eliza Calder, unpublished data); and Unzen (Japan), Soufrière Hills (Montserrat), Nevado del Huila (Colombia), Guagua Pichincha (Ecuador) and Redoubt (USA) for Sinabung (Heather Wright, pers. comm., October 12, 2018).

For all three volcanoes, the multi-criteria analogy values and the percentage of volcanoes in the GVP database that are better analogues than the selected a priori analogues are analysed. This percentage can be calculated as follows: (1) for each target volcano and weighting scheme, a value of multi-criteria analogy can be calculated between the target volcano and any volcano in the GVP database; (2) the value of multi-criteria analogy for a given a priori analogue volcano corresponds with a specific percentile of the distribution of all multi-criteria analogy values; (3) one minus this percentile provides the percentage of better analogues (i.e. those with higher multi-criteria analogy than the a priori analogue). Additionally, and in the case of Fuego, an analysis on how sensitive the results of VOLCANS are to changes in the eruptive record of the target volcano has been carried out. This analysis shows that the method is quite stable even when adding or removing up to 15 eruptions from this eruptive record (see Online Resources 7 and 8).