Challenges and pitfalls for AI application in medicine

AI is not without pitfalls, and serious challenges must be overcome to deliver its full potential. The most critical challenges are described below, with potential directions to surmount them. For a more in-depth discussion of AI’s current most pressing issues, the reader is referred to several excellent reviews.

Data

AI systems and models are as good as the data they learn from. This relates to the data’s (1) quality and quantity, (2) suitability, and (3) availability. The first challenge refers to data quality and quantity. Low quality of input data, leading to biased outcomes, is often referred to as the GIGO (‘garbage in garbage out’) principle. Data quantity also remains challenging, since AI models are extremely ‘data hungry’, especially for deep learning methods. The availability and quality of data labels are critical, as label inaccuracies directly impair model reliability. In particular for images, manual labeling of images is time-consuming. Combining and harmonizing multiple datasets is increasingly used to overcome these data limitations. The use of synthetic data may also help, where additional data is generated by simulating from a known data distribution, which has been shown to improve model performance. Similarly, in image analysis, data augmentation is often used to (fictively) increase the data sample size by applying data transformations on existing (non-synthetic data points). Another strategy to improve model reliability on relatively small datasets is transfer learning , especially popular in NLP and image analysis. This technique enables researchers to train a complex model on relatively small datasets by recalibrating existing parameters of known models.
Data suitability poses a second challenge. Akin to traditional analytical methods, AI approaches need adequate study designs to yield reliable outcomes, from data collection to the appropriate analytical strategy. Training algorithms based on unsuitable data may lead to biased outcomes. For example, it is increasingly clear that AI and ML algorithms can engrain racial bias when models are trained using racially imbalanced data sets.
Data availability may pose a third challenge, as data is often siloed within individual institutions, and curated, publicly accessible clinical datasets remain rare. The reason for this includes patient privacy, lack of data-sharing infrastructure, and competition among institutions. In immunology, efforts are being made to break open silos and democratize datasets. Examples include the National Institutes of Health (NIH)-curated resources on open-access COVID-19 data, or the European Health Data Space for the safe exchange and reuse of health data. These developments are aided by novel data sharing and integration approaches, such as federated learning, where a model is centrally trained while the data are kept locally. Recently Swarm Learning was introduced, a decentralized machine-learning approach that does not require central coordination. The researchers demonstrate that the model outperforms individual sites in disease classification while retaining complete confidentiality.

Explainability

The lack of explainability of AI algorithms hampers clinical implementation. Unlike statistical methods such as regression, which are inherently explainable, the learned patterns of AI models are more complex, and their estimated parameters are not directly interpretable (Figure 3).