Offline AI · March 27, 2026 · 8 min read

Deploying AI in air-gapped environments

Running modern AI inside a classified or fully disconnected network is a solved problem. Most of the difficulty is logistics, not research. Here’s what the end-to-end deployment actually looks like.

"Air-gapped" means different things to different organizations. For our purposes: a network with no internet egress, where everything installed has to be carried across the boundary as a vetted artifact, and where every update is a controlled event.

Defense, intelligence, critical infrastructure, and some regulated research environments operate this way. AI can work there — it just requires a specific build-and-deploy discipline most teams aren’t used to.

The six things you can’t rely on

Model downloads. No `huggingface-cli login`. Weights must be staged on removable media and hash-verified.
Package managers. `pip install` and `apt update` won’t reach anything. You need a private mirror or offline bundles.
Containerregistries. No Docker Hub. Every image has to be re-hosted internally.
Telemetry. No Sentry, Datadog, or Segment calling home. All observability must be local.
Auto-updates. Nothing updates itself. Updates are scheduled events.
External API fallbacks. No "if local model fails, call OpenAI." There is no fallback across the boundary.

The deployment pipeline

A clean air-gapped build looks like this:

Build on a connected network. Pull model weights, package dependencies, container images, Python wheels. Generate a Bill of Materials.
Vulnerability scan. Scan every dependency and image before packaging. Nothing unscanned crosses the boundary.
Sign & hash. Every artifact gets a SHA-256 hash and (where policy requires) a detached signature.
Transfer. Approved removable media, or a one-way diode if available, with documented chain-of-custody.
Verify on arrival. Re-hash everything, verify signatures, reject anything that doesn’t match.
Install to private registry. Push images to internal registry, wheels to internal PyPI mirror, weights to artifact store.
Deploy. Normal MLOps deployment from there, but pinned to your internal artifacts only.

Model choices that actually ship on air-gapped networks

Llama 3 family (Meta license review required per deployment)
Mistral / Mixtral (Apache 2.0 — easiest license for government)
Qwen (check policy on Chinese model sources; some environments prohibit)
Phi (Microsoft; MIT license)
BGE / E5 / Nomic for embeddings

Avoid anything that phones home by default. Some commercial "self-hosted" models have license-check callbacks. Read the model card carefully.

Observability without external tools

Prometheus + Grafana on-prem for metrics. Loki or OpenSearch for logs. Both run entirely offline, both are license-compatible with government deployments.

Build your eval harness as a cron job running against golden Q/A pairs. Report metrics into Prometheus. Dashboards surface regressions before users do.

The update cadence question

A common objection: "but models improve every quarter and we can’t pull updates." True, and usually fine. Model improvements yield 5-15% accuracy gains on average benchmarks. That gain rarely matters for a narrow production use case. What matters is that your deployed model is well-evaluated for your task.

Schedule updates quarterly. Evaluate each candidate before deployment. Keep the previous version available for rollback. A disciplined once-a-quarter upgrade cycle produces more stable systems than a continuous one.

The hardest part (people, not tech)

The pattern we see in failed air-gapped AI projects isn’t a technical gap. It’s a coordination gap between the AI team (wants to iterate fast), the security team (won’t approve iteration), and operations (wants to know what "done" looks like).

Three things fix it:

Get security involved at architecture review, not at deployment. Their sign-off speed collapses when they feel blindsided.
Publish a quarterly release cadence. Predictability is a form of control.
Run a Table-Top exercise on the first update before you need it. The dry run reveals the process gaps.

What a typical engagement looks like

For air-gapped deployments we usually run a longer engagement than standard — 10-14 weeks rather than 6-10. The extra time is almost entirely about the deploy-verify-approve loop being slower. Budget accordingly.

Discuss an air-gapped project More posts →