Private inference

When you use an AI service, you’re handing over your thoughts in plaintext. The operator stores them, trains on them, and–inevitably–will monetize them. You get a response; they get everything.

Confer works differently. In the previous post, we described how Confer encrypts your chat history with keys that never leave your devices. The remaining piece to consider is inference—the moment your prompt reaches an LLM and a response comes back.

Traditionally, end-to-end encryption works when the endpoints are devices under the control of a conversation’s participants. However, AI inference requires a server with GPUs to be an endpoint in the conversation. Someone has to run that server, but we want to prevent the people who are running it (us) from seeing prompts or the responses.

Confidential computing

This is the domain of confidential computing. Confidential computing uses hardware-enforced isolation to run code in a Trusted Execution Environment (TEE). The host machine provides CPU, memory, and power, but cannot access the TEE’s memory or execution state.

LLMs are fundamentally stateless—input in, output out—which makes them ideal for this environment. For Confer, we run inference inside a confidential VM. Your prompts are encrypted from your device directly into the TEE using Noise Pipes, processed there, and responses are encrypted back. The host never sees plaintext.

But this raises an obvious concern: even if we have encrypted pipes in and out of an encrypted environment, it really matters what is running inside that environment. The client needs assurance that the code running is actually doing what it claims.

Attestation

Confidential computing solves this with remote attestation. When a confidential VM boots, the hardware generates a signed quote—a cryptographic statement containing hashes of the kernel, initrd, and command line. These hashes are called measurements, and they uniquely identify the code running inside the TEE.

For Confer, we extend the measurement to cover the entire root filesystem using dm-verity. This builds a merkle tree over every byte of the filesystem and embeds the merkle root hash in the kernel command line. Since the command line is measured, any change to any file changes the attestation. The measurement now covers everything.

But a measurement is just a hash. To verify it, someone needs to be able to reproduce it.

Making the attestation verifiable

The Confer proxy and image are built with nix and mkosi to produce bit-for-bit reproducible outputs. Anyone can clone the repository, run the build, and get the exact same measurements.

Each release is also signed and published to a transparency log that is easily searchable. The log is append-only and publicly auditable, which means we can’t quietly publish a different build for different users.

Now we have an intuition for how these parts can come together.

The connection

When you open Confer and start a conversation, your client initiates a Noise handshake with the inference endpoint. The TEE responds with its attestation quote embedded in the noise handshake.

Your client verifies the quote’s signature, confirming it came from genuine TEE hardware. It checks that the public key in the quote matches the one used in the handshake–this binds the encrypted channel to the TEE, preventing interception or replay by anything outside it. It extracts the measurements and confirms they match a release in the transparency log.

Once verification succeeds, the handshake completes. Your client now has cryptographic assurance that it’s talking directly to verified code running in hardware-enforced isolation.

All traffic to the inference endpoint is then encrypted with ephemeral session keys that provide forward secrecy: even if a long-term key were later compromised, past conversations remain protected because the ephemeral keys no longer exist.

A different model

Confer combines confidential computing with passkey-derived encryption to ensure your data remains private.

This is different from traditional AI services, where your prompts are transmitted in plaintext to an operator who stores them in plaintext (where they are vulnerable to hackers, employees, subpoenas), mines them for behavioral data, and trains on them.

We think Confer is how AI should work.