Introduction
Mid-infrared (MIR) microscopic imaging is a modern analytical method that is widely used to characterize the components of biological specimens [1-3] and has already been applied in tissue histology [4-6]. MIR microscopic imaging was, for example, used to aid in the differentiation between benign and malignant disease [3, 5, 7-11] and was tested for imaging lymph node histopathology [6, 12]. Our group successfully implemented this method for the differentiation between reactive lymph nodes, small and large cell lymphoma using follicular lymphoma (FL), and diffuse large B-cell lymphoma (DLBCL) as an example [13].
IR microscopic imaging experiments are measurements that include high-quality and high-quantity information. Thus, chemometric tools in imaging analysis are a prerequisite to taking advantage of the entire measurement [14]. Statistical classification methods have been used for histopathological studies, which discriminate between pixels of healthy tissue versus pixels of diseased tissue [15, 16]. Multifactorial statistical analysis methods related to IR data have been widely implemented for identifying changes in lipids, proteins, nucleic acids, and carbohydrates, such as principal component analysis (PCA) [17, 18] and partial least squares (PLS) [19, 20] combined with discriminant analysis (DA) [21], hierarchical cluster analysis (HCA) [22], support vector machines (SVMs) [23] and random forest (RF) [24]. As a pattern-recognition-based approach, the (artificial) neural network ((A)NN) proved to be effective in analyzing data obtained from biological specimens, also using IR imaging techniques [25-27]. NNs perform best when trained with a large amount of labelled data [28, 29]. However, in pathology, labelling data, which means (whole-slide) images, is complex and time-consuming [28]. These limitations can be eliminated by employing so-called self-supervised and unsupervised techniques [28, 29]. One unsupervised technique is the Convolutional Autoencoder (CAE), which is trained with unlabelled data [29, 30]. Such NNs were developed to work with high-dimensional data and have already been implemented to solve medical problems, such as radiology, cardiology, neurology, and even pathology [30-34]. In brief, the input layer of the CAE compresses the data and creates a code, which is then used to reconstruct them in the output layer [30]. The CAE-based network consists of several convolutional layers with a decreasing number of filters that create a bottleneck in the centre of the network (Figure 1a ). After this bottleneck, the number of neurons increases again, allowing the network to reproduce the input data. The bottleneck in the centre of the CAE forces the NN to learn the features of the input data. This can be compared to a dimensionality reduction as performed with PCA [30].
This study combines MIR imaging of unstained tissue slides and a deep learning approach using CAE as an example of an unsupervised technique to differentiate between benign and malignant lymphoid tissues and to classify lymphoma subtypes.