Fine-Tuned LLM Assistant for Streamlined Medical Diagnostics

Internet Services

USA

nda

We fine-tuned a general-purpose LLM to create an AI-based smart assistant that could help medical staff process diagnostic device reports within less time and with better result quality.

Web app

Python

overview

[01]

About the client

Our client is a medical company that performs diagnostics and delivers device-driven diagnostic reports to hospitals and outpatient partners.

[02]

The Challenge

In device-driven diagnostics, patient data is collected using a medical device, like an imaging machine or a blood pressure monitor. These devices rely on a simple algorithm that “translates” signals and sensor readings into raw or minimally processed but human-readable data: a list of figures, a graph, or a medical image. Thus, these data can be understood and interpreted by a clinician.

Some devices, like automated EKG readers or blood glucose monitors, can use a fixed set of rules to make a basic interpretation, for example: “IF glucose reading > 180 mg/dL, THEN display 'High glucose'.” However, algorithmic diagnostics cannot analyze data in the context of patients’ medical history and individual characteristics, account for edge cases, adapt to environmental variables, or learn from external input.

As a result, with the increase in patient flow, our client faced several challenges:

Medical devices generate complicated, technical reports that are hard to read quickly. This creates a risk that important information gets missed or takes too long to review, potentially delaying patient treatment.
Clinicians needed help finding clear, context-specific insights supported by diagnostic evidence, instead of just generic summaries.
When multiple devices flag abnormal readings, there's no clear way to know which issues need immediate attention versus which can wait. This lack of prioritization makes it difficult to focus on the most critical patients first.

Therefore, our client wanted to create a system that would count these factors in and be able to:

Summarize device readings into clinician-friendly narratives.
Address clinicians’ questions about device results with citations.
Triage anomalies for secondary review (leveraging the human-in-the-loop approach).

Solution

Tasks such as writing, summarization, answering questions, and recognizing patterns based on device diagnostics reports can be solved by leveraging large language models. These AI systems can process and understand unstructured text and generate texts in natural language.

However, most off-the-shelf LLMs are general-purpose LLMs, which means they are likely to struggle with specialized tasks, such as:

Handling domain-specific terminology in diagnostic device outputs.
Being consistent in clinical language and report structure.
Generating more trustworthy and clearer answers.

A Fine-Tuned LLM

To deal with these limitations, general-purpose LLMs can be fine-tuned, allowing us to adapt the model to our client’s needs. Thus, with fine-tuning, we can transform any general-purpose LLM into a domain-specific smart assistant for clinicians able to:

Learn directly from the client’s device reports and clinicians’ narratives to produce summaries that match the needed style and precision.
Adopt diagnostic thresholds and reporting conventions to reduce the risk of report misinterpretation.
Generate explainable and evidence-linked outputs, a crucial requirement for compliance in medical contexts.

In our case, we developed a fine-tuned large language model tailored to the client’s diagnostic data and clinical workflows that took several steps.

[01]

Data Collection

The crucial precondition for a successful model fine-tuning is having a large amount of raw data that can be used. This should be relevant data that contains the knowledge, terminology, document structures, and styles we want our fine-tuned model to operate with. Our client provided us with a massive volume of device logs, diagnostic reports, medical records, approved medical references, and expert clinician documentation.

[02]

Data Preparation and Cleaning

Before we could use this data to fine-tune the model, it had to be preprocessed. To achieve high-quality and expected responses from the model, it is important to “feed” it with cleaned, structured, and labeled data. One way to do this is to create a bulk of examples of desired behavior – the samples of Q&A pairs that teach the model how it should behave.

In our case, each “example” consisted of a real-world input (such as a device log entry, a segment of a diagnostic report, or a clinical question) paired with the corresponding expert output (for instance, an annotated interpretation, a standardized diagnosis, or a reference explanation). This format ensured our model could learn the terminology and document structures as well as reasoning patterns and styles used by clinicians.

Relying on the dataset that we collected from the client, we generated around 50 thousand of such examples – the volume considered as plenty to teach a general-purpose model to complete a new, domain-specific task. Also, the process of data preparation included documentation standardization, error correction, duplicate removal, and exclusion of irrelevant or sensitive data.

[03]

Fine-Tuning the Model

We used Azure’s AI Foundry service to run a private instance of the LLM that we fine-tuned with the dataset we cleaned and prepared. Due to the capabilities of the service, this process is completely automated and quite straightforward.

As a result, our fine-tuned model version became familiar with our client’s operating domain and was able to:

Generate concise, clinically appropriate summaries, using the domain-specific medical vocabulary and terminology.
Cite source data directly in responses, thus delivering conversational but verifiable outputs for clinicians.
Flag potential anomalies (e.g., abnormal ranges, suspicious patterns) within the context of a patient’s medical history providing reasoning and supporting evidence, and escalate these cases to human reviewers.
Avoid generic or off-topic responses and a common LLM problem – hallucinations.
Provide output in the form that fits the request, e.g., a summary, a report following a specific template, an alert, etc.

[04]

Testing and Validation

Once the model was fine-tuned with our specific data set, we tested it in the client’s real-life diagnostic scenarios and medical enquiries to check the response accuracy, clarity, and compliance. Feedback loops allowed us to refine answers until they were consistent and business-ready.

[05]

Deployment and Integration

Finally, we deployed the fine-tuned and tested model in the company’s internal diagnostics automation system. Azure AI’s enterprise-grade security helped us ensure that the sensitive information present in the dataset remained private and compliant with data security regulations.

The Result

We helped our client working in the healthcare industry equip their device-driven diagnostics with a smart, LLM-based assistant for clinicians, streamlining their routine workflows. As a result, within several months of post-implementation, the client managed to achieve:

A 60% time reduction for interpreting device readings.
Reduced manual review load due to AI-backed triage, while keeping safety with human oversight.
Increased diagnosis accuracy as a result of contextual and evidence-based medical review.
Increased response relevance, with always-present references to data sources.