A dedicated AI Monitoring Committee has been established, granting it full authority to halt AI model deployment in case of any issues. This ensures accountability throughout the integration process, safeguarding the interests of patients and healthcare institutions.
Researchers at Harvard Medical School and the Mass General Brigham AI Governance Committee have developed comprehensive guidelines for integrating AI into healthcare effectively and responsibly. A cross-functional team of 18 experts from various domains, including informatics, research, legal, data analytics, equity, privacy, safety, patient experience, and quality, was formed. Through an extensive peer-reviewed and gray literature search, critical themes were identified.
The researchers focused on nine key principles: fairness, robustness, equity, safety, privacy, explainability, transparency, benefit, and accountability. Three focus groups were established to refine these guidelines: one focusing on robustness and safety, another on fairness and privacy, and the third on transparency, accountability, and benefit. Each group consisted of 4-7 expert members.
A structured framework was developed and executed to facilitate the application of AI guidelines within a healthcare setting. Generative AI and its application in ambient documentation systems were selected as a representative case study, considering the unique challenges of monitoring such technologies, such as ensuring patient privacy and mitigating AI hallucinations.
A pilot study was conducted with select individuals from different departments. Privacy and security were given top priority, with strictly de-identified data shared with the vendor to enable continuous updates and improvements. Close collaboration with the vendor ensured strict de-identification, data retention policies, and controlled use of data solely for enhancing model performance.
Subsequently, a shadow deployment phase was implemented where AI systems operated in parallel with existing workflows without disrupting patient care. After shadow deployment, key performance metrics such as fairness across demographics, usability, and workflow integration were rigorously evaluated.
Collaboration with vendors played a vital role. Rigorous discussions were held on data retention policies, continuous model updates, and safeguarding patient privacy through strict de-identification protocols. This collaborative effort was crucial in ensuring the successful integration of AI into healthcare.
The researchers identified several components critical for the responsible implementation of AI in healthcare. Mandating diverse and demographically representative training datasets helps reduce bias. Outcomes should be evaluated through an equity lens, and regular evaluations of equity should include model reengineering to ensure fair benefits for all patient populations.
Transparent communication of the AI system's Food and Drug Administration (FDA) status is equally important. Specifying whether FDA approval is required and detailing the current status of the AI system helps ensure compliance and build trust. A risk-based approach should be adopted to monitor AI systems, with more robust monitoring for applications that pose higher risks to care outcomes.
The preliminary phase (pilot study) allowed for comprehensive functionality assessments and feedback collection. This was crucial in identifying issues early in the implementation process. During shadow deployment, most users were from the departments of emergency medicine and internal medicine.
Feedback revealed both the strengths and areas for improvement of the AI system. While most criticisms focused on documenting physical examinations, the system received praise for its accuracy when working with interpreters or patients with strong accents.
In conclusion, this study presented a methodology for incorporating AI into healthcare. The multidisciplinary approach provides a blueprint for non-profit organizations, healthcare institutes, and government bodies aiming to implement and monitor AI responsibly. Challenges such as balancing ethical considerations with clinical utility were highlighted, emphasizing the importance of ongoing collaboration with vendors to refine AI systems.
Future work will focus on expanding testing to include broader demographic and clinical case diversity while automating performance monitoring. These efforts aim to ensure that AI systems remain adaptable and equitable across various healthcare environments. The study demonstrates the importance of continuous evaluation, monitoring, and adaptation of AI systems to ensure their efficacy and relevance in challenging clinical settings.
Journal reference:Saenz, A. D., Centi, A., Ting, D., You, J. G., Landman, A., & Mishuris, R. G. (2024). Establishing responsible use of AI guidelines: A comprehensive case study for healthcare institutions. Npj Digital Medicine, 7(1), 1-6. DOI: 10.1038/s41746-024-01300-8, https://www.nature.com/articles/s41746-024-01300-8