FIGURE 5. Whole-slide analysis using the proposed method. (a)
H&E-stained slide images, (b) corresponding RI sections, (c) annotation
by pathologists from (a), and (d) CNN-based prediction from (b). The
first three rows illustrate three different slides, and the last row is
a magnified view of the third slide. Scale bars = 500 µm (first three
columms) and 50 µm (last column).
To quantitatively evaluate the similarity between the predicted and
ground truth images, the categorical image similarity metric (CatSIM)
was computed.[38] CatSIM measures the similarity
in RBC and fibrin distribution and uses the vector of proportions of
each class and categorical variance, which is related to the diversity
index. The terms for categorical luminance, contrast, and structural
similarity were calculated to determine the CatSIM value (See Supporting
Information). The CatSIM values for all three samples were close to 0.9,
indicating that the DL prediction successfully inferred the thrombus
structure and accurately identified the RBC-rich and fibrin-rich areas.
3.4 Thrombus Composition Ratio and Cross-validation
The DL model predicted the quantitative composition of the patient’s
thrombus sample slides at the whole-slide level (Figure 6a ).
The average error between the predicted composition and the ground truth
was 1.1%, with a maximum error of 2.6%. We employed a slide-level
cross-validation scheme to evaluate the DL model without bias in a
limited number of samples. Each whole-slide sample provided at least
~3×104 patches, which is sufficient
for DL training. There was slight variability in the prediction results
based on different training data (Figure 6b). CNN2, which was trained
with slide 2, tended to predict a higher ratio of fibrin than the ground
truth (1.0% and 2.5%). The slide that was the least accurate was slide
1. Although the error was maintained under a certain level, the
cross-slide error is thought to be largely a result of the difference
between the adjacent slides or the variable nature of staining.