A private LLM that runs on hardware your team already owns.

ClosedMesh turns the unused capacity of your team's laptops, workstations and on-prem boxes into a single peer-to-peer inference mesh. The chat surface runs in your browser. Inference runs on machines you control. Nothing leaves the network.

No third-party LLM APIPeer-to-peer, encryptedOpenAI-compatible runtimeMac · Linux · Windows
Architecture

Two layers, one product.

ClosedMesh is split between a thin product surface — the chat UI you're using right now — and a peer-to-peer inference runtime that handles model loading, routing, and distribution across machines. They're shipped and versioned separately.

Product surface
ClosedMesh
The chat UI and local controller.
  • Browser-side chat UI with thread persistence and streaming responses.
  • A local controller that auto-starts at login and proxies to the runtime on the same machine.
  • Fleet visibility — number of nodes online, per-node hardware, currently-loaded models.
  • Hosted at closedmesh.com; the page calls back into the visitor's localhost so prompts never traverse our infrastructure.
Inference runtime · open source
ClosedMesh LLM
The peer-to-peer engine.
  • OpenAI-compatible API at localhost:9337/v1.
  • Pipeline parallelism for dense models that don't fit on one machine.
  • MoE expert sharding for Mixture-of-Experts models — zero cross-node inference traffic.
  • Capability-aware routing: requests only go to nodes that can actually serve them.
How it works

The page talks to your laptop. Your laptop talks to the mesh.

When you open this site, the browser doesn't reach our servers for inference. It reaches back into the controller running on your own Mac, which routes the request to whichever peer in the mesh is best positioned to serve it.

Browserclosedmesh.comchat UI · static/api/chatLocal controllerlocalhost:3000on your Mac · launchd/v1ClosedMesh LLM peersM-series MacCUDA · 4090Vulkan laptop
01
Browser

The chat UI loads from closedmesh.com. The page itself is static — it never sees a prompt or a token.

02
Local controller

A tiny Next.js service installed on each teammate's machine. CORS-allowed for closedmesh.com. Speaks OpenAI-compatible to the runtime.

03
Inference mesh

One or more ClosedMesh LLM peers. Capability-matched. Auto-routes around offline nodes. Handles dense and MoE models that don't fit on one box.

Why a mesh

Capacity is everywhere on your team. ClosedMesh just uses it.

No vendor in the loop

Conversations stay on machines you own. No keys to revoke, no per-token bill, no third-party retention policy to read.

Heterogeneous hardware

An M-series Mac, an RTX 4090 box and a Vulkan laptop happily serve the same conversation. Each node advertises its capability; the router only sends work it can actually run.

Models bigger than one box

Dense models split across nodes by layer (pipeline parallelism). MoE models split by expert with zero cross-node inference traffic.

OpenAI-compatible

Every node exposes a standard /v1/chat/completions endpoint. Drop-in for any tool that speaks OpenAI — agents, IDE plugins, internal scripts.

Auto-route around failure

Laptops sleep. Workstations reboot. The mesh keeps serving — requests are dispatched only to live, capability-matched peers.

Single-binary install

One curl command per machine. The runtime drops into ~/.local/bin and registers a launchd / systemd / scheduled-task autostart.

Hardware support

Whatever the team is already running.

The installer detects OS, CPU architecture and GPU vendor, then pulls the matching runtime build. You can also pin a backend explicitly for unusual setups.

OSHardwareBackend
macOSApple SiliconMetal
Linuxx86_64 · NVIDIACUDA
Linuxx86_64 · AMDROCm
Linuxx86_64 · Intel / otherVulkan
Linuxx86_64 · CPU-onlyCPU
Linuxaarch64Vulkan / CPU
Windows 10/11x86_64 · NVIDIACUDA
Windows 10/11x86_64 · AMD / Intel / otherVulkan
WSL2x86_64 · NVIDIA passthroughCUDA