In July 2021, the UK standards organisation BSI announced that it was developing guidelines for the application of ISO 14971 to artificial intelligence (AI) and machine learning (ML). This was followed by the recent publication in April 2022 of the consensus report AAMI CR34971:2022, Guidance on the Application of ISO 14971 to Artificial Intelligence and Machine Learning.
Designed to be used in conjunction with ISO 14971:2019, Medical devices—Application of risk management to medical devices, CR34971:2022 shares its structure with ISO/TR 24971:2020 Medical devices — Guidance on the application of ISO 14971. Standards are written to support manufacturers’ design products, some standards become harmonised standards and are given a role within the regulatory framework, Technical Reports (TR) and consensus reports (CR) describe the best practice, but a non-conformance would never be raised against them. Consensus usually indicates that there may be divergent opinions or approaches, but this is the core that the majority can agree on; however, in the new and evolving science, it is a valuable benchmark of good practice.
Risk management is the cornerstone of the medical device product development lifecycle, this consensus report (CR) aims to provide a framework for identifying and addressing the unique AI/ML-related hazards, hazardous situations, and potential harms, that can arise across all stages of the product lifecycle.
Now with the formalities aside, what does this new consensus report offer to those using AI and ML in the development of medical devices? Perhaps most useful are Annex B and its subsections which contains risk management examples (from hazards to risk control measures) on the identification of characteristics related to safety, which cover the following areas in more detail:
1. Data Management
2. Bias
3. Data storage/security/privacy
4. Overtrust
5. Adaptive systems
While some of these areas are specific to AI/ML, others are familiar bugbears for medical device developers; however, each brings novel complications when working with AI/ML, let’s look at these in more detail…
Data management is a broad field, but CR34971 calls out the need to consider specific issues such as data completeness, consistency, and correctness. What are the implications of the data quality and model complexity as they pertain to performance hazards, applicability, and generalization? This will depend on the specifics of your devices, but here you are provided with examples of each and prompted to consider these issues as part of your risk management activities. For example, using incorrect, incomplete, subjective, inconsistent, and/or atypical data can lead to deterioration of AI/ML model performance, and the related hazards associated with these data quality issues, and assumptions of the data properties must be included in the risk management process, including control measures used to mitigate their impact on performance and safety. This section also raises the issue of “bias/variance trade-off”, a fundamental issue when developing AI/ML models, and the need to consider controlling complexity in model development.
Consideration of bias, one of the most fundamental pillars of statistical rigour necessary for medical and scientific research and the medical device product development lifecycle, receives an appropriate spotlight in CR34971. While noting that bias can have both positive and negative performance effects, a handful of types of bias that can specifically impact AI/ML models are discussed in detail – selection bias, implicit bias, group attribution bias, and experimenter’s bias. This section highlights how missing data, sample bias (data not collected randomly), and coverage bias (data does not match the target population) can result in selection bias and potential risks to product safety and efficacy. As mitigation, it is recommended that verification is performed at the end of data collection, to ensure the data set is appropriately distributed. Of course, device manufacturers must consider, and evaluate, how bias can introduce hazards and hazardous situations beyond the development phases. For example, the design of the human user interface of a decision-making device that determines risk levels should be evaluated to ensure that the means of reporting calculated risk does not introduce bias and unduly influence the user.
Data storage/security/privacy is already at the forefront of attention for all organisations, due to the business and regulatory risks associated with ignoring or neglecting them. What is special for AI/ML medical devices? When it comes to cyber security, the CR calls out an example of the impact of an adversarial attack against a medical image classifier, with the potential of subtle image changes resulting in completely different classifications with high confidence. Whilst the misclassification of a cat as guacamole is highly amusing the implications of this are much greater than the harms associated with the misclassification of a cat as guacamole, but the tools to do so are the same, and freely available online.