Introduction
Mid-infrared (MIR) microscopic imaging is a modern analytical method
that is widely used to characterize the components of biological
specimens [1-3] and has already been applied in tissue histology
[4-6]. MIR microscopic imaging was, for example, used to aid in the
differentiation between benign and malignant disease [3, 5, 7-11]
and was tested for imaging lymph node histopathology [6, 12]. Our
group successfully implemented this method for the differentiation
between reactive lymph nodes, small and large cell lymphoma using
follicular lymphoma (FL), and diffuse large B-cell lymphoma (DLBCL) as
an example [13].
IR microscopic imaging experiments are measurements that include
high-quality and high-quantity information. Thus, chemometric tools in
imaging analysis are a prerequisite to taking advantage of the entire
measurement [14]. Statistical classification methods have been used
for histopathological studies, which discriminate between pixels of
healthy tissue versus pixels of diseased tissue [15, 16].
Multifactorial statistical analysis methods related to IR data have been
widely implemented for identifying changes in lipids, proteins, nucleic
acids, and carbohydrates, such as principal component analysis (PCA)
[17, 18] and partial least squares (PLS) [19, 20] combined with
discriminant analysis (DA) [21], hierarchical cluster analysis (HCA)
[22], support vector machines (SVMs) [23] and random forest (RF)
[24]. As a pattern-recognition-based approach, the (artificial)
neural network ((A)NN) proved to be effective in analyzing data obtained
from biological specimens, also using IR imaging techniques [25-27].
NNs perform best when trained with a large amount of labelled data
[28, 29]. However, in pathology, labelling data, which means
(whole-slide) images, is complex and time-consuming [28]. These
limitations can be eliminated by employing so-called self-supervised and
unsupervised techniques [28, 29]. One unsupervised technique is the
Convolutional Autoencoder (CAE), which is trained with unlabelled data
[29, 30]. Such NNs were developed to work with high-dimensional data
and have already been implemented to solve medical problems, such as
radiology, cardiology, neurology, and even pathology [30-34]. In
brief, the input layer of the CAE compresses the data and creates a
code, which is then used to reconstruct them in the output layer
[30]. The CAE-based network consists of several convolutional layers
with a decreasing number of filters that create a bottleneck in the
centre of the network (Figure 1a ). After this bottleneck, the
number of neurons increases again, allowing the network to reproduce the
input data. The bottleneck in the centre of the CAE forces the NN to
learn the features of the input data. This can be compared to a
dimensionality reduction as performed with PCA [30].
This study combines MIR imaging of unstained tissue slides and a deep
learning approach using CAE as an example of an unsupervised technique
to differentiate between benign and malignant lymphoid tissues and to
classify lymphoma subtypes.