Introduction
The phenotype of organisms varies continuously during development and
through evolutionary time. Continuous morphological variation is
captured for numerous purposes in the life sciences via the practice of
morphometry: the measurement of the size and shape of anatomical forms.
Morphometry has yielded novel findings in evolution (Esquerré et al.,
2020) and has been used to assess fluctuating asymmetry (Palmer, 1993;
Klingenberg, 2015), ontogeny (Csősz & Majoros, 2009; Shingleton et al.,
2007), ecomorphism (Mahendiran et al. 2018; Tomiya & Meachen, 2018;
Anderson et al., 2019), and in human clinical practice (Bartlett &
Frost, 2008). Among other applications, morphometric data are also key
for alpha taxonomy, the discipline of formally differentiating and
describing species and higher taxa. This is exemplified by the
development of phenetics in the twentieth century (Michener & Sokal,
1957; Sokal & Sneath, 1963) and by numerous modern studies in other
frameworks, such as for plants (Savriama, 2018; Chuanromanee, Cohen &
Ryan, 2019), animals (Villemant, Simbolotti & Kenis, 2007; Inäbnit,
2019), and other organisms (Fodor et al., 2015; McMullin et al., 2018).
Continuous data are also valuable, for modeling evolutionary histories
(e.g., Parins-Fukuchi, 2017, 2020). Thus, the morphometric approach
constitutes a fundamental and crucial practice for the study of
phenotypes in biodiversity research.
Morphology is traditionally considered to comprise both continuous and
discrete traits (Artistotle, 350; Thompson, 1917; Rensch, 1947; Remane,
1952). Discrete states were established as the basic comparative units
in animal alpha taxonomy from its formalization (Linnaeus 1758), and
have become a key means of scoring data for phylogenetic analysis,
particularly after Hennig (1950, 1966). The reproducibility of scoring
discrete states is an issue, however, as qualitative perception of
phenotype not only requires specific training and considerable
experience but can also be plagued by arbitrariness (Bond & Beamer,
2006), meaning that variation may simply come from individual
(mis-)interpretation. The qualitative approach commonly uses verbal
species descriptions that are often subjective or difficult to
articulate. Therefore, information transfer, if at all reliable, is
based on one-to-one knowledge sharing mechanisms, and requires
logically-structured linguistic hierarchies such as the Hymenoptera
Anatomy Ontology (Yoder, Mikó, Seltmann, Bertone & Deans, 2010).
In contrast to this relatively idiosyncratic approach, morphometry is
considered transferable. It converts variation of shape, size of
anatomical traits, and number and arrangement of anatomical elements
into numerical values, allowing for the dissemination of reproducible,
phenotype-based knowledge. Today, an increasing number of
morphology-based insect alpha-taxonomists use morphometric data and
provide numeric keys to species (Steiner Schlick-Steiner & Moder, 2006;
Csősz Heinze & Mikó 2015; Seifert, 2018). If observers arrive at the
same conclusion by measuring traits according to the same protocol,
findings are believed to be reliable and transferable. If one can
measure a trait, anyone else should be able to reproduce it.
However, measurements come with error. Agreement among different
observers and within a single observer’s measurements is affected by a
number of sources, such as the skills of the observer (if human input is
required), the precision and accuracy of the equipment, clear
interpretation and appropriate understanding of the character recording
protocol, and other parameters. All of the uncertainty factors mentioned
above are common in practice, and the fact that it is impossible to
control every source of measurement variation challenges
morphometry-based research (Wolak, Fairbairn & Paulsen, 2012).
Understanding of the degree to which measurement errors may affect the
transferability of findings is urgently needed. During the last few
decades, reproducibility issues have been studied in vertebrate
systematics (e.g., Oxnard, 1983, Corruccini, 1988; Yezerinac, Lougheed
& Handford, 1992; Helm and Albrecht, 2000; Takacs Vital, Ferincz &
Staszny, 2016; Fox, Veneracion & Blois, 2020), clinical research (e.g.,
Bland & Altman, 1986; Ridgway et al., 2008; Phexell et al., 2019),
social science (e.g., Salganik et al. 2020), molecular phylogeny and
genetic clustering (e.g., Huelsenbeck, 1998; Jones et al., 1998;
DeBiasse & Ryan, 2019), and morphometric data generally (Andrew et al.,
2015). However, to date, reproducibility assessments of morphometric
data in entomology are extremely limited (Mutanen & Pretorius, 2007;
Johnson et al., 2013).
In order to address the question “to what extent is insect morphometry
reproducible?”, we compiled a broad database of morphometric data and
performed robust statistical analyses. We used ants, a group in which
the application of morphometric data has a long tradition (e.g., Brown,
1943; Brian & Brian, 1949), as a model organism. Morphometry has been
employed widely in recent myrmecological studies (e.g., Ward, 1999;
Baroni Urbani, 1998; Seifert, 1992, 2003, 2019; Csősz Heinze & Mikó,
2015; Wagner et al., 2017) as the primary method of interpreting
anatomical forms and their variation. Eleven participants of diverse
levels of skill and expertise, working with different taxonomic routines
over three continents and six countries, were asked to perform repeated
measurements on the same set of ant specimens, according to the same
measurement protocol, with their own equipment. The wide range of
morphometric skills and the quality of microscopes used provided us with
an overview of the level of reproducibility of morphometric
interpretation as it works in daily practice. Our findings are a first
step in exploring the reproducibility of morphometric data across
entomology.
Terminology [Textbox 1.]
A number of terms (e.g. “accuracy”, “precision”, “reliability”,
“repeatability”, and “reproducibility”) commonly used in association
with repeatability studies are defined differently in the literature. To
increase the fluency of scientific discourse, we propose to adopt the
standard terminology of the National Institute for Standards and
Technology (NIST, Taylor & Kuyatt, 2001) of the USA and terms proposed
by (Bartlett & Frost, 2008) in biological systematics:
● Accuracy describes the average closeness of the measurement(s) to the
value of the measurand (= subject or quantity to be measured) (Fig. 1).
Accuracy is affected by systematic and random error. We follow the
terminology proposed by the NIST in using the phrase ”the value of the
measurand” instead of the often-applied ”true value of the measurand”
(or ”a true value”) (Taylor & Kuyatt, 2001).
● Precision refers to the closeness of the measurements between pairs of
measurements made on the same measurand and applying the same protocol.
Precise measurements are tightly clustered, but are not necessarily
accurate, i.e. close to the value of the measurand (Fig. 1). Precision
is affected by random error.
● Reliability refers to the amount of measurement error that occurs
between observed measurements compared to the inherent amount of
variability that occurs between measurands (Bartlett & Frost, 2008).
● Repeatability refers to the degree of agreement between repeat
measurements made on the same measurand under the same conditions, i.e.
made by the same observer, using the same microscope, following the same
measurement protocol (Taylor & Kuyatt, 2001). Repeatability can be
assessed via intra-class correlation (ICC, see Lessells & Boag, 1987).
● Reproducibility refers to the degree of agreement between measurements
made on the same measurand under changing conditions, such as changing
principle, method of measurement, observer, instrument, etc. (Taylor &
Kuyatt, 2001).