Purpose & relevance |
P1. Disclose which clinical problem the
model addresses and how it fits in a clinical workflow |
|
MI-CLAIM,
CONSORT-AI, FUTURE-AI |
|
P2. Collect modeling data in a consistent, clinically relevant and
generalizable manner that aligns with the intended use |
|
DECIDE-AI,
MI-CLAIM, FDA |
|
P3. Benchmark performance to existing clinical standards of care or
previous AI studies or proof of concepts |
Choosing a proper benchmark
is essential to demonstrate clinical relevance and show the potential
for patient care. |
FUTURE-AI, TRIPOD-AI, MI-CLAIM,
FUTURE-AI |
Model development |
M1. Design a conceptual model with a
definition of the predicted outcome and its presumed relationship to the
input variables |
A conceptual model stimulates inclusion of domain
expertise, focus and prioritization in data collection, and alignment
with existing hypothesis and knowledge. |
|
|
M2. Safeguard appropriate separation between training, validation, and
test datasets |
Ensure that model optimization is performed on the
training set with tuning of model configuration on the validation set,
without affecting the test set. |
TRIPOD-AI, MI-CLAIM, MINIMAR,
FDA |
|
M3. Ensure proper documentation and execution of model optimization
steps |
The number of configurations steps and decision is generallye
extensive and requires thorough tracking and documentation. In addition,
it is vital that these steps are not performed on the test set. |
MINIMAR, FUTURE-AI |
|
M4. Determine the evaluation procedure, metrics and rationale
up-front, before starting the modeling procedure |
Defining metrics
post-analysis is a common pitfall that can lead to overestimation of
performance. The evaluation procedure should be in concordance with what
it clinically relevant. |
MI-CLAIM, FUTURE-AI |
Replicability |
R1. Evaluate model performance in a prospective
study, randomized trial, or at least an independent replication cohort |
Ensure that the testing conditions are clinically relevant and
representative for the intended usage context. |
RISE, MINIMAR,
FDA |
|
R2. Perform sensitivity and robustness checks to assess whether the
system is impartial to changing environments or populations |
This can
further be boosted by training the model on a heterogeneous population. |
TRIPOD-AI, MI-CLAIM, FUTURE-AI |
|
R3. Disclose data preprocessing and the way in which data quality is
assessed and ensured |
|
DECIDE-AI, TRIPOD-AI, MI-CLAIM, MINIMAR,
CONSORT-AI, FUTURE-AI |
Explainability |
E1. Determine and provide appropriate levels
of interpetability, depending on use case and users. |
This often guids
which algorithm and interpretability tools need to be employed. |
RISE,
TRIPOD-AI, MI-CLAIM, FUTURE-AI |
|
E2. Leverage interpretability toolkits and libraries for black box
models |
The complexity and black box nature of AI models warrant more
focus on interpretability. |
RISE, FUTURE-AI |
System design & usage |
S1. Focus on multi-displinary
collaboration during the full AI solution lifecyle |
Involve a broad
range of functional expertise, from AI leads, users, clinicans,
study-design experts, in alle phases of development and implementation. |
RISE, FDA, FUTURE-AI |
|
S2. Invest in the instruction of users on how to interact with the
system and predictions |
Establishing trust in the solution and
explaining how it integrates with the clinical workflow is key for
adoption and impact on clinical outcome. |
DECIDE-AI, MINIMAR,
CONSORT-AI, FUTURE-AI, FDA |
|
S3. Set up monitoring processes to track technical and analytical
performance |
|
FDA, FUTURE-AI |
|
S4. Set up a feedback flow to facilitate iterative system improvement |
Performance results and user interaction and feedback can be used to
improve the system in a targeted way. This can be either periodically or
automated in a more self-learning setup, although complexity and
existing legislation limit usage of the latter. |
DECIDE-AI |
Risks & ethics |
R1. Define and evaluate the ethical
considerations of the system, e.g. algorithmic fairness |
Various
frameworks are available for the responsible application of AI and ML,
which provide guidance on the relevant components and methods to assess
them. |
DECIDE-AI, CONSORT-AI, FUTURE-AI |
|
R2. Assess the potential risks involved in the system and
outline approach to manage and mitigate them |
|
DECIDE-AI |