FIGURE 5. Whole-slide analysis using the proposed method. (a) H&E-stained slide images, (b) corresponding RI sections, (c) annotation by pathologists from (a), and (d) CNN-based prediction from (b). The first three rows illustrate three different slides, and the last row is a magnified view of the third slide. Scale bars = 500 µm (first three columms) and 50 µm (last column).
To quantitatively evaluate the similarity between the predicted and ground truth images, the categorical image similarity metric (CatSIM) was computed.[38] CatSIM measures the similarity in RBC and fibrin distribution and uses the vector of proportions of each class and categorical variance, which is related to the diversity index. The terms for categorical luminance, contrast, and structural similarity were calculated to determine the CatSIM value (See Supporting Information). The CatSIM values for all three samples were close to 0.9, indicating that the DL prediction successfully inferred the thrombus structure and accurately identified the RBC-rich and fibrin-rich areas.
3.4 Thrombus Composition Ratio and Cross-validation
The DL model predicted the quantitative composition of the patient’s thrombus sample slides at the whole-slide level (Figure 6a ). The average error between the predicted composition and the ground truth was 1.1%, with a maximum error of 2.6%. We employed a slide-level cross-validation scheme to evaluate the DL model without bias in a limited number of samples. Each whole-slide sample provided at least ~3×104 patches, which is sufficient for DL training. There was slight variability in the prediction results based on different training data (Figure 6b). CNN2, which was trained with slide 2, tended to predict a higher ratio of fibrin than the ground truth (1.0% and 2.5%). The slide that was the least accurate was slide 1. Although the error was maintained under a certain level, the cross-slide error is thought to be largely a result of the difference between the adjacent slides or the variable nature of staining.