Researchers at UT Southwestern Medical Center have demonstrated that ChatGPT, an artificial intelligence (AI) chatbot designed to assist with language-based tasks, can efficiently extract data from physicians' clinical notes for research purposes. Their findings, published in NPJ Digital Medicine, indicate that this technology could significantly accelerate clinical research and foster advancements in computerized clinical decision-making aids.
“By transforming oceans of free-text health care data into structured knowledge, this work paves the way for leveraging artificial intelligence to derive insights, improve clinical decision-making, and ultimately enhance patient outcomes,” stated Dr. Yang Xie, the study's leader. Dr. Xie is a Professor in the Peter O’Donnell Jr. School of Public Health and the Lyda Hill Department of Bioinformatics at UT Southwestern. Additionally, she serves as Associate Dean of Data Sciences at UT Southwestern Medical School, Director of the Quantitative Biomedical Research Center, and is a member of the Harold C. Simmons Comprehensive Cancer Center.
Dr. Xie's research lab focuses on developing and utilizing data science and AI tools to enhance biomedical research and health care. The team investigated whether ChatGPT could expedite the process of analyzing clinical notes—memos physicians write to document patient visits, diagnoses, and statuses—to identify relevant data for clinical research and other applications. According to Dr. Xie, clinical notes contain vast amounts of valuable information. However, because they are composed in free text, extracting structured data traditionally necessitates trained medical professionals to read and annotate them, a time-consuming and resource-intensive process prone to human bias. Existing natural language processing (NLP) programs require extensive human annotation and model training, leading to the underuse of clinical notes for research purposes.
To test ChatGPT's capability in converting clinical notes to structured data, Dr. Xie and her team had the AI analyze over 700 pathology notes for lung cancer patients. They sought to determine the major features of primary tumors, lymph node involvement, and cancer stage and subtype. The results revealed that ChatGPT achieved an average accuracy of 89%, as verified by human reviewers. This analysis, which took a few days to fine-tune, significantly outperformed other traditional NLP methods in accuracy and efficiency.
Further testing was conducted to evaluate the generalizability of this approach to other diseases. ChatGPT was used to extract information about cancer grade and margin status from 191 clinical notes on osteosarcoma patients, the most common type of bone cancer in children and adolescents. The AI returned information with nearly 99% accuracy on grade and 100% accuracy on margin status.
Dr. Xie emphasized that the effectiveness of ChatGPT was heavily influenced by the prompts provided—a technique known as prompt engineering. Supplying multiple response options, giving examples of appropriate answers, and directing ChatGPT to rely on evidence for conclusions enhanced its performance.
She noted that leveraging ChatGPT or other large language models to extract structured data from clinical notes could expedite clinical research and facilitate clinical trial enrollment by matching patient information to trial protocols. However, Dr. Xie cautioned that ChatGPT is not a replacement for human physicians.
“Even though this technology is an extremely promising way to save time and effort, we should always use it with caution. Rigorous and continuous evaluation is very important,” Dr. Xie advised.