A significant barrier to the adoption of AI is the lack of assurance that AI systems are safe and reliable across diverse populations and heterogeneous environments. AI models are sensitive to changes in the input data, which can lead to errors such as misdiagnosis. This issue, known as data distribution shift, occurs when there are differences in patient populations or changes in how data is collected over time and across different regions.
Regulators and healthcare providers face a dilemma. On one hand, there is a strong need for technological solutions to improve patient care, especially when health services are under pressure. On the other hand, there is a lack of resources and tools to conduct independent, comprehensive performance testing of AI models.
This pilot study aims to investigate the use of generative AI technology for synthesising highly realistic medical imaging data for performance stress testing and bias detection. We build upon advanced causal image synthesis to generate counterfactual images with specific characteristics (simulating changes in image acquisition and patient population). The pilot focuses on chest radiograph disease detection and mammographic density prediction. The goal is to inform MHRA guidance on the safety assessment of medical imaging AI.


