Sandboxing an AI Agent with KVM, nftables, and Zero DNS

Intro
The Setup
Why a VM and Not a Container
The Architecture
The Fun Parts
What It Can and Can’t Do
What’s Next
Conclusion

Intro

Claude Code has an official Telegram plugin that connects a Telegram bot to a running Claude Code instance. I’ve been using it on my home server, a personal minicomputer running Manjaro that also hosts Home Assistant and Ollama. I chat with Claude from my phone, ask it to do things on the server, and it works.

The catch: Claude Code needs --dangerously-skip-permissions to run as an always-on agent. Full shell access, no guardrails. On a shared machine with personal services, I’m not comfortable with that out of the box.

I spent the last few weeks learning how to sandbox it. The sandboxing part turned out to be more fun than I expected.

The Setup

Claude Code runs inside a KVM virtual machine on the home server. I interact with it via Telegram and can start/stop the VM remotely over WireGuard VPN. The VM is sandboxed: it can’t reach my LAN or the internet, only the Anthropic and Telegram APIs through a forward proxy I control.

flowchart LR
    Phone["📱 Phone"]
    Phone -->|Telegram| CC["Claude Code\n(inside VM)"]
    CC -->|responds| Phone
    Phone -->|WireGuard VPN| CS["Control Service"]
    CS -->|start / stop| CC

Why a VM and Not a Container

I went with KVM instead of Docker because the threat model is adversarial: assume the agent inside is compromised. With containers, you share a kernel. Cgroup escapes, filesystem mount tricks, and attack surface that’s hard to reason about when your adversary is a creative AI with shell access.

KVM gives you hardware isolation. The guest runs in a separate address space enforced by the CPU. I stripped the QEMU device model down to two disks and one network interface. Nothing else.

Is it overkill? Maybe. Lighter sandboxing options like gVisor or Firecracker would work fine for most setups. But I wanted to learn KVM, and this was a good excuse. I like that the security properties are straightforward: the VM boundary is the CPU.

The Architecture

flowchart TD
    subgraph Host["🖥️ Host · Manjaro"]
        direction TB
        CS["Control Service\n(WireGuard)"]
        subgraph VM["VM · Debian 12 · ephemeral overlay · no DNS"]
            CC["Claude Code + Telegram plugin"]
        end
        Proxy["tinyproxy\n─────────────────\n*.anthropic.com ✅\napi.telegram.org ✅\neverything else ❌"]
        FW["nftables · policy drop on forward chain"]
    end
    CS -->|"start / stop"| VM
    CC -->|"only path out"| Proxy
    Proxy --> API["Anthropic API\nTelegram API"]

Four ideas make this work:

Ephemeral boot: each VM start creates a fresh copy-on-write overlay on a read-only base image. Cron jobs, modified binaries, SSH keys: gone on restart. A separate persistent data disk holds working files, and on every boot I clean up certain folders and files to make sure they match what the IaC expects.
Proxy-only egress with zero DNS: the VM has no DNS. No resolver, no port 53, resolv.conf is empty and immutable. All internet access goes through a tinyproxy forward proxy on the host that allows *.anthropic.com and api.telegram.org. Everything else is denied. I didn’t bother with DNS filtering. No DNS means nothing to filter.
nftables with policy drop: all traffic between networks is blocked by default. The VM can reach the proxy and nothing else. No LAN, no host services.
External kill switch: the control service runs on the host, reachable only via WireGuard. virsh destroy kills the QEMU process. The agent can’t prevent or delay it.

The Fun Parts

nftables and libvirt fight each other. libvirt auto-generates its own nftables table with rules that reject traffic to VMs. My rules were correct but powerless: nftables evaluates all base chains at the same hook, and accept in one chain doesn’t override reject in another. I fixed it with forward mode='open' in the libvirt network config: libvirt handles DHCP but generates zero firewall rules.

I also had Claude (from inside the VM) run a full isolation assessment: subnet sweeps, port scans, DNS resolution attempts. All blocked except the two whitelisted endpoints. Having the agent try to break out of its own sandbox is a practical test. It’s also entertaining.

What It Can and Can’t Do

Can’t do: touch the host filesystem, reach LAN devices, access arbitrary internet endpoints, exfiltrate via DNS, persist anything across restarts, prevent being killed, or tamper with the firewall.

Can still do: send Telegram messages to people who’ve /started the bot, send data to Anthropic or Telegram via their APIs (real exfiltration channels, but the only two, and both operated by known entities), and burn API credits (mitigated by spend limits).

I’m comfortable with these risks. The sandbox makes the realistic threats either impossible or recoverable in minutes.

What’s Next

The sandbox is done, but the VM is still a blank Claude Code instance. Next step: setting up my daily log and note-taking system inside the VM so that the Telegram agent has the same context persistence as my local sessions.

Conclusion

This started as “I want to use Claude Code from my phone” and turned into a KVM, nftables, and network isolation project. I’m still learning. Firecracker or gVisor might suit other setups better. For my homelab, where I want to understand each layer and keep full control, this approach works.

If you’re curious about the nftables config, the ephemeral boot setup, or the proxy whitelist, reach out via email (see below) or Twitter/X. Happy to share configs or go deeper on any part.

Table of Contents