by Find-A-Codeā¢
May 2nd, 2024
There has been a lot of talk about generative AI and its potential to replace human medical coders over the last 12 months or so. In Find-A-Code have done our best to alleviate any fears among our readers. Now we can point to scientific research demonstrating that generative AI just isn't up to the task.
Will it ever be? Time will tell. But for now, generative AI struggles with numerical tasks. Research just published by Mount Sinai's Icahn School of Medicine proves as much. More importantly, the research shows that generative AI struggles more with medical coding when larger language models are utilized.
How the Study Was Conducted
Researchers utilized data from more than 27,000 unique procedures and diagnostic codes over a 12-month span. The data was gleaned from routine Mount Sinai Health System medical services. Only medical codes were looked at. Patient data was excluded.
In order to measure AI accuracy in correctly coding medical events, the research utilized large language models from Google, Meta, and OpenAI (the model behind ChatGPT). Specifically, they tested the abilities of four large language models:
- GPT-3.5 (OpenAI)
- GPT-4 (OpenAI)
- Gemini Pro (Google)
- Llama2-70b (Meta)
When all was said and done, all four models struggled to accurately interpret clinical language and translate it into the most appropriate code. Researchers went on to explain that generative AI's difficulty with numerical tasks had already been known. However, it had never been studied using different large language models. Now that it has been studied, researchers can definitively say that all the models struggled.
How bad were the results? All four models returned less than 50% accuracy. GPT-4 was the most accurate at 45.9% for ICD-9 codes, 33.9% for ICD-10 codes, and 49.8% for CPT codes. Researchers determined that "unacceptably large" volumes of errors were a persistent problem.
Not an Exact Science
Given the potential generative AI has shown in everything from writing informative texts to creating stunning images, it is strange to think that an AI system could not accurately parse clinical text and convert it to an alpha-numeric code. Perhaps some of the difficulty is rooted in the fact that medical coding isn't an exact science.
ICD-10 rheumatology codes illustrate the point easily enough. There are currently more than 400 such codes which, in and of itself, is impressive. But there were just 14 rheumatology codes in ICD-9. A desire for greater specificity was the impetus behind increasing rheumatology codes by nearly 40 times. But the additional specificity also created quite a bit of overlap.
Where the human brain can account for overlap when converting clinical text to a medical code, generative AI apparently cannot. It appears that even the largest language models struggle with the amount of specificity found in modern code sets.
Scientists Will Keep Trying
It is clear that generative AI is not yet up to the task of accurately coding medical events. Will it ever be? No one knows. We can say for sure that scientists will keep trying to get it right for the foreseeable future. In the meantime, medical coders don't have to live in fear about losing their jobs.
As an accomplished medical coder, you know how to properly utilize the most common codes. You know the ins and outs of diagnostic code lookup; you are familiar with going online to find lesser used ICD-10 and CPT codes.
Just keep doing your job to the best of your ability. Continue to learn, grow, and adapt to changes as they come along. You will be a better coder for it.