Skip to main navigation Skip to search Skip to main content

From Natural Language to Interpretable Code: Automated Code Generation for Healthcare with Large Language Models-A Comparative Analysis

  • Yuexi Chen
  • , Gauri Vaidya
  • , Alison N. O’connor
  • , Meghana Kshirsagar
  • University of Limerick

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This article presents a comparative evaluation of three large language models (LLMs), namely GPT-4o, Gemini 2.0 Flash 2.0 Flash, and Claude 3.5 Sonnet, examining their ability to automate key healthcare workflows while adhering to algorithmic constraints and supporting interpretability and fairness. The models were evaluated using Python, JavaScript, and Go under varying levels of prompt completeness across four healthcare tasks of increasing complexity: bed allocation, dynamic patient bed reallocation, ambulance dispatch, and patient triage. We introduce a multidimensional evaluation framework that captures model performance across task complexity, prompt completeness, and programming language, with an emphasis on generating functionally correct, transparent, and reliable code. This framework enables a systematic analysis of how effectively LLMs translate natural language specifications into executable logic under realistic, constraint rich healthcare scenarios. Experimental results show that all three models generate constraint compliant solutions for simpler tasks such as bed management. However, as task complexity increases and multiple constraints must be balanced, clear performance differences emerge. Claude 3.5 Sonnet consistently outperforms GPT-4o and Gemini 2.0 Flash 2.0 Flash by producing more robust, interpretable, and reliable code. These findings highlight Claude 3.5 Sonnet’s stronger potential for transparent and dependable automation of critical healthcare services using LLM based code generation. The code is publicly available at: https://github.com/gauriivaidya/alter-automated-healthcare-tasks.

Original languageEnglish
Title of host publicationProceedings of the 18th International Conference on Agents and Artificial Intelligence
EditorsAna Paula Rocha, Mattias Wahde, H. Jaap van den Herik
PublisherScience and Technology Publications, Lda
Pages829-840
Number of pages12
ISBN (Print)9789897587962
DOIs
Publication statusPublished - 2026
Event18th International Conference on Agents and Artificial Intelligence, ICAART 2026 - Marbella, Spain
Duration: 5 Mar 20268 Mar 2026

Publication series

NameInternational Conference on Agents and Artificial Intelligence
Volume1
ISSN (Print)2184-3589
ISSN (Electronic)2184-433X

Conference

Conference18th International Conference on Agents and Artificial Intelligence, ICAART 2026
Country/TerritorySpain
CityMarbella
Period5/03/268/03/26

Keywords

  • Code Generation
  • Healthcare Workflow Automation
  • Large Language Models
  • Operational Efficiency
  • Programming Languages

Fingerprint

Dive into the research topics of 'From Natural Language to Interpretable Code: Automated Code Generation for Healthcare with Large Language Models-A Comparative Analysis'. Together they form a unique fingerprint.

Cite this