The local LLM manager that scales to enterprise. Run LLMs on your laptop or manage them across your GPU server fleet.
Run local LLMs on your own hardware. No cloud dependency, no telemetry. Your data stays yours.
Manage LLMs across multiple GPU servers from a single dashboard. Scale your enterprise LLM infrastructure from one GPU to a fleet.
Built on llama.cpp and open standards. No vendor lock-in. GGUF models from HuggingFace.
The easiest local LLM manager for your machine.
Enterprise LLM management infrastructure for your team.
Getting Started
Choose the path that fits your needs.
docker compose up -d
./mvnw spring-boot:run
npm run dev
Compare
Both products share the same foundation. Pick what works for you.
Cost
Running Qwen3-Coder-Next on a Hetzner GPU server vs. paying per-token for comparable API models.
Pricing as of March 2026. Blended rate assumes 50/50 input/output split. Hetzner GEX44: €184/mo + €79 one-time setup. Self-hosted throughput varies by model quantization and context length. API providers offer different model capabilities.