HYPOTHESIS AND AIMS: In patients with cancer being staged with whole body diffusion-weighted MR (WB-DW-MR), machine learning (ML) methods to automatically detect metastatic lesions may improve diagnostic accuracy (DA) and reduce radiology reading time (RT). We aim to test ML methods in WB-DW-MR scans in patients with known cancer and established disease stage.
DESIGN, POPULATION, ASSESSMENTS: Phase 1: Development of ML pipeline ‘A’ for automatic anatomic labeling in WB-DW-MR of 50 healthy volunteers using segmentation techniques. Phase 2 training: 150 scans from NIHR STREAMLINE (colorectal/lung cancer, CRUK MELT (lymphoma)& MASTER (lymphoma/prostate cancer) main studies with established disease stage will be used to train ML detection of metastases. Double radiology reads +ML ‘A’ support will use main study reference standards to identify true lesions and ML errors in order to refine ML algorithm through ‘A+’, ‘B’ and finally ‘C’. Interim sensitivity tested in 40-50 scans. Phase 3 validation: 217 scans from the primary studies will be read by radiologists with +ML ‘C’ using sequential viewing of sequences; internal pilot in first 50-70. DA will be measured against the main study reference standards and RT +/- ML will be recorded.
INTERVENTIONS: WB scans in will be read +ML ‘C’ support by two expert and one new WB-MR reader, with additional sub-studies in inter-observer variation and RT. ML will automatically identify suspicious areas for review by radiologist.
PRIMARY OUTCOME MEASURE: per-patient specificity of WB-DW-MR +ML ‘C’ compared to standard radiology read (WB-DW-MR from main study) against the reference standard established in the main study.
SECONDARY OUTCOME MEASURES: comparison of +ML ‘C’ results with ML results from main studies: 1. Sensitivity of WB-MR; 2. Read time (RT) 3. inter-reader variability; 4. DA of inexperienced readers; 5. different combinations of WB-MR sequences; 6. simple cost-effectiveness analysis measured by the cost of applying ML support against the time and resource (number of possible additional staging tests) saved.
SAMPLE SIZE: Phase 2: 150 patients. 40-50 additional patients for interim sensitivity analysis. Phase 3: 217 patient from STREAMLINE, MELT and MASTER studies.
STASTISTICAL ANALYSIS: PRIMARY: The per patient specificities of the two methods against reference standard will be compared using McNemar’s test for paired proportions. SECONDARY: to include: sensitivity per patient and per lesion, specificity per lesion; RT; inter-observer variance; diagnostic accuracy of different MR sequences.
MAIN STAGES OF PROJECT & EXPECTED DURATION: Phase 1 (months 0-9): Healthy volunteer data for development of ML pipeline ‘A’. Phase 2 training (months 10-29): 150 scans read by two experts to identify true lesions and ML errors; optimisation of ML algorithm to A+, B and final ML ‘C’. Interim analysis of sensitivity in 40-50 scans. Phase 3 validation (months 28-42): Reading of 217 scans +ML’C’ by two expert and one non expert WB reader. Comparison of DA and RT from main studies (- ML). Complete final report.