Healthcare · April 10, 2026 · 8 min read

HIPAA and LLMs: what your legal team will actually approve

The three architectural patterns that get a green-light from healthcare compliance officers — and the one that gets an immediate "no."

HIPAA predates LLMs by two decades. The rules weren’t written with transformer models in mind, but they apply to them exactly the same way they apply to databases, cloud storage, and every other system that touches Protected Health Information (PHI). The question healthcare engineers ask us most often is: "Can we use LLMs at all?" The answer is yes — with architectural discipline.

This post covers the four patterns we see in the field, from "immediate no" to "already in production at multiple health systems."

Pattern 1 (avoid): Pipe PHI to a public API

Send patient records to OpenAI, Anthropic, or Gemini via their public API. Get back a summary. Display it to clinicians.

This fails HIPAA for a simple reason: without a signed Business Associate Agreement (BAA) from the provider, you cannot legally share PHI with them. Even with a BAA (all three major providers now offer one), your privacy officer still has to reason about:

  • Where the data is processed (jurisdictional risk)
  • Whether the vendor retains it for model training
  • What happens if the vendor has a breach
  • What happens if the vendor terminates the BAA

Most compliance officers will approve this only for specific, narrow use cases where the risk-benefit is obvious. It is the hardest path.

Pattern 2 (good): De-identify, then call the API

Run PHI through a de-identification pipeline (removing names, dates, MRNs, and the other HIPAA Safe Harbor identifiers) before anything touches the LLM. The model never sees PHI, so the BAA discussion is moot.

This pattern works well for:

  • Clinical trial recruitment screening
  • Research analytics
  • Population-health trend summarization

The catch: de-identification is hard. The Safe Harbor method is mechanical but brittle; the Expert Determination method is more robust but requires a qualified expert. Invest properly in the de-identification stage or you will leak PHI in ways you didn’t anticipate.

Pattern 3 (better): BAA-covered hosted model

Azure OpenAI and AWS Bedrock offer BAA-eligible hosting for models that are otherwise commercial APIs. The model runs in a tenant you control, under a BAA, with no training on your data.

This works well for use cases where:

  • You need GPT-4-class capability (not always available in open-weight models)
  • You can’t justify GPU hardware for on-prem deployment yet
  • Your privacy officer is comfortable with a major cloud as a business associate (most are)

The tradeoff: you’re still depending on a hyperscaler’s service. Outages affect you. Pricing changes affect you. But for most US-based provider organizations, this is the pragmatic path right now.

Pattern 4 (best for sensitive workloads): Fully on-prem open-weight models

Deploy Llama, Mistral, or a biomedical-specific open-weight model on hardware inside your network. No BAA discussion needed because no external entity ever sees the PHI.

This pattern is ideal for:

  • Workloads involving the most sensitive PHI (behavioral health, HIV status, genetic data)
  • Organizations where "data leaves our datacenter" is a hard policy
  • Cases where you need custom fine-tuning on internal data
  • Research institutions with existing GPU infrastructure

Our offline LLM deployment guide covers the architecture in detail. The short version: Llama-3-70B on two H100s handles most production workloads for a mid-sized health system.

The four questions your compliance officer will ask

Regardless of pattern, expect these four questions:

  1. Where does the PHI travel? Have a data-flow diagram ready, not a whiteboard sketch.
  2. Who has access to the raw inputs and outputs? Access matrix, with least-privilege enforced and audited.
  3. What happens in a breach scenario? Incident response plan with AI-specific scenarios (prompt injection, unauthorized output, model malfunction).
  4. How do we prove the model didn’t make it up? Citation requirement on every output, with a log of retrieved source documents.

The fourth question is where most AI projects collapse. Clinicians — correctly — will not act on an LLM summary that can’t show its work. Grounded generation with citations isn’t a nice-to-have in healthcare; it’s the difference between deployable and not.

Five specific HIPAA-LLM use cases that ship

  • Prior-authorization drafting. LLM drafts the justification language; a human physician reviews and signs. Cuts doctor time 30-60%.
  • Clinical note summarization for chart review. Compact bullet summary at the top of a chart. Citation back to the originating note for every claim.
  • Ambient documentation assistance. Transcribe clinical conversations locally, produce a draft note. Physician edits and signs. Nothing leaves the device or facility network in the on-prem variant.
  • Patient-education generation. Personalized discharge instructions based on the patient’s diagnosis and literacy level, using only structured inputs (not raw notes).
  • Internal knowledge search. RAG over institutional policies, billing guides, and clinical protocols. No PHI needed for the corpus.

Two patterns we advise against for now

  • Direct-to-patient clinical LLMs. The regulatory surface is still shifting and the liability is enormous. Use AI to help clinicians, not replace them in direct patient interaction.
  • "Autonomous" agents acting on EHR data. Anything that writes back to the EHR without a human in the loop. The incident blast radius is too large today.

If you’re starting a healthcare LLM project

Start with Pattern 2 (de-ID + hosted) for non-sensitive use cases, or Pattern 4 (fully on-prem) for sensitive ones. Skip Pattern 1 entirely unless you have a very specific reason. Treat Pattern 3 as a pragmatic middle ground when the model capability matters more than complete data sovereignty.

We run healthcare-specific Strategy Sprints that cover architecture, HIPAA risk analysis, and a priced build plan in two weeks.