FIGURE 2. Flowchart of patch-wise annotation for RI images. (a) Two consecutive sections of a thrombus are retrieved, one of which is H&E stained. (b) The stained and unstained sections are respectively imaged with BF microscopy and ODT, and registered. A single focal section of the RI tomogram with the highest contrast is utilized. (c) The registered images are divided into patches. Each patch is labeled based on the annotation of a trained pathologist. Scale bar = 500 µm.
2.3 Data Annotation
Since the two axially consecutive images had nearly identical compositions of RBC and fibrin, we could label each RI patch (64 × 64 pixels) based on the H&E bright-field image, which had highly correlated information compared to the RI image. Experienced pathologists performed color-based semi-automatic segmentation on bright-field images using Adobe Photoshop (Version 2019, Adobe systems, San Jose, CA, USA), used as a standard reference technique. H&E-stained bright-field images were annotated by two board-certified pathologists to obtain ground truth information. Each pixel of the registered bright-field images was annotated as one of three types: an RBC-rich area, fibrin/platelet-rich area, or background. The ground-truth histological composition of each thrombus sample was determined by counting the number of RBC and fibrin pixels from the pathologist-annotation result.
After annotating the H&E-stained bright-field image, we divided the unstained QPI and bright-field images into small patches of 64 × 64 pixels (FOV 10.88 μm × 10.88 μm). For each QPI image patch, there was a corresponding bright-field image patch at the same position that had already been annotated. To predict the histological composition with DL, we train a DL model to classify small RI tomogram section patches into three categories: RBC, fibrin, and background. For the training dataset, we prepared 64 × 64 pixels QPI image patches, categorized into three components (RBC, fibrin, and background) using the information from the registered and annotated BF image patches (Figure 2). We designed a label generation and patch selection rule to assign the class of the image patch (0: RBC, 1: fibrin, and 2: background) and exclude ambiguous patches that contain composite components (Figure 2c). For every patch, the initial class was determined by counting the number of classified pixels in each class (RBC, fibrin, and background) and assigning the class indicated by the most pixels.
To provide a suitable database for DL, we selected cases where over 80% of the pixels in the patch indicated a certain class to form an accurate training set. Patches that contain many pixels that do not belong to the assigned class are undesirable for DL training. For example, a patch that contained 40% of RBC pixels, 30% of fibrin pixels, and 30% of background pixels would be excluded from the training set. Note that the selected patch size, 10.88 × 10.88 μm2, was sufficiently small to prevent the exclusion of too many patches that did not have dominating components while selecting 76% of the total patches for the training set.
2.4 Deep Learning and Optimization
The proposed deep neural network consists of convolutional layers, pooling layers, ReLU activation layers, and fully connected layers. It takes a 64 × 64 RI image patch and classifies it into one of three subtypes (RBC, fibrin, and background). The network extracts various spatial features of each patch using contracting convolutional operations (7 × 7, 3 × 3) with nonlinear operations. The final label is determined using a fully connected layer. Batch normalization is applied at every stage before the ReLU activation to accelerate training speed and improve regularization.
The labeled image patches were divided into training, validation, and test sets at a ratio of 7:1.5:1.5. Data augmentation techniques, including random rotation and flips, were used only during the training stage. We set the cross-entropy loss based on the SoftMax function. For parameter learning with backpropagation, we used the Adam optimizer with β1 = 0.9, β2 = 0.999, a learning rate of 1×10−5, and a batch size of 256.[35] We used graphics processing units (GPU) (TESLA P40, NVIDIA) with CUDA Toolkit 10.0 (NVIDIA) for training. After 200 epochs of training, we selected the optimal model by early stopping, evaluating the validation loss for every epoch. We used the Python 3.7.2 environment with PyTorch version 1.8.1.
2.5 Inference of Histological Composition
To test our trained network, we imaged the RI of an unseen thrombus and quantified its composition. The trained DL model classified each patch (64 × 64) of the RI image as RBC-rich, fibrin-rich, or background. For rapid inference, parallel input with a batch size of 1,024 was used. Based on the classification results, the thrombus composition was calculated by specifying the proportions of RBC and fibrin. The spatial distribution of the RBC and fibrin components could also be visualized by coloring the RI patch position with pseudocolors according to the classification result and arranging the patch to the original spatial position.
Using patch-by-patch inference makes this method robust to local mismatches during image registration. The DL model classifies the patch by evaluating the overall features and is tolerant to a small portion of mislabeled pixels.