FIGURE 2. Flowchart of patch-wise annotation for RI images. (a)
Two consecutive sections of a thrombus are retrieved, one of which is
H&E stained. (b) The stained and unstained sections are respectively
imaged with BF microscopy and ODT, and registered. A single focal
section of the RI tomogram with the highest contrast is utilized. (c)
The registered images are divided into patches. Each patch is labeled
based on the annotation of a trained pathologist. Scale bar = 500 µm.
2.3 Data Annotation
Since the two axially consecutive images had nearly identical
compositions of RBC and fibrin, we could label each RI patch (64 × 64
pixels) based on the H&E bright-field image, which had highly
correlated information compared to the RI image. Experienced
pathologists performed color-based semi-automatic segmentation on
bright-field images using Adobe Photoshop (Version 2019, Adobe systems,
San Jose, CA, USA), used as a standard reference technique. H&E-stained
bright-field images were annotated by two board-certified pathologists
to obtain ground truth information. Each pixel of the registered
bright-field images was annotated as one of three types: an RBC-rich
area, fibrin/platelet-rich area, or background. The ground-truth
histological composition of each thrombus sample was determined by
counting the number of RBC and fibrin pixels from the
pathologist-annotation result.
After annotating the H&E-stained bright-field image, we divided the
unstained QPI and bright-field images into small patches of 64 × 64
pixels (FOV 10.88 μm × 10.88 μm). For each QPI image patch, there was a
corresponding bright-field image patch at the same position that had
already been annotated. To predict the histological composition with DL,
we train a DL model to classify small RI tomogram section patches into
three categories: RBC, fibrin, and background. For the training dataset,
we prepared 64 × 64 pixels QPI image patches, categorized into three
components (RBC, fibrin, and background) using the information from the
registered and annotated BF image patches (Figure 2). We designed a
label generation and patch selection rule to assign the class of the
image patch (0: RBC, 1: fibrin, and 2: background) and exclude ambiguous
patches that contain composite components (Figure 2c). For every patch,
the initial class was determined by counting the number of classified
pixels in each class (RBC, fibrin, and background) and assigning the
class indicated by the most pixels.
To provide a suitable database for DL, we selected cases where over 80%
of the pixels in the patch indicated a certain class to form an accurate
training set. Patches that contain many pixels that do not belong to the
assigned class are undesirable for DL training. For example, a patch
that contained 40% of RBC pixels, 30% of fibrin pixels, and 30% of
background pixels would be excluded from the training set. Note that the
selected patch size, 10.88 × 10.88 μm2, was
sufficiently small to prevent the exclusion of too many patches that did
not have dominating components while selecting 76% of the total patches
for the training set.
2.4 Deep Learning and Optimization
The proposed deep neural network consists of convolutional layers,
pooling layers, ReLU activation layers, and fully connected layers. It
takes a 64 × 64 RI image patch and classifies it into one of three
subtypes (RBC, fibrin, and background). The network extracts various
spatial features of each patch using contracting convolutional
operations (7 × 7, 3 × 3) with nonlinear operations. The final label is
determined using a fully connected layer. Batch normalization is applied
at every stage before the ReLU activation to accelerate training speed
and improve regularization.
The labeled image patches were divided into training, validation, and
test sets at a ratio of 7:1.5:1.5. Data augmentation techniques,
including random rotation and flips, were used only during the training
stage. We set the cross-entropy loss based on the SoftMax function. For
parameter learning with backpropagation, we used the Adam optimizer with
β1 = 0.9, β2 = 0.999, a learning rate of
1×10−5, and a batch size of
256.[35] We used graphics processing units (GPU)
(TESLA P40, NVIDIA) with CUDA Toolkit 10.0 (NVIDIA) for training. After
200 epochs of training, we selected the optimal model by early stopping,
evaluating the validation loss for every epoch. We used the Python 3.7.2
environment with PyTorch version 1.8.1.
2.5 Inference of Histological Composition
To test our trained network, we imaged the RI of an unseen thrombus and
quantified its composition. The trained DL model classified each patch
(64 × 64) of the RI image as RBC-rich, fibrin-rich, or background. For
rapid inference, parallel input with a batch size of 1,024 was used.
Based on the classification results, the thrombus composition was
calculated by specifying the proportions of RBC and fibrin. The spatial
distribution of the RBC and fibrin components could also be visualized
by coloring the RI patch position with pseudocolors according to the
classification result and arranging the patch to the original spatial
position.
Using patch-by-patch inference makes this method robust to local
mismatches during image registration. The DL model classifies the patch
by evaluating the overall features and is tolerant to a small portion of
mislabeled pixels.