Distributed AI Infrastructure

Your GPUs. One Inference Layer.

Turn heterogeneous hardware into a unified AI compute platform. Run LLM inference across NVIDIA, AMD, Intel, and Apple Silicon — from a single API.

Go Coordinator
Python Workers
React Dashboard

GPU Resources Are Scattered

Your organization has GPUs everywhere — workstations, servers, cloud instances. Different vendors, different capabilities, different machines. Today, each one is an island. What if you could use them all as one?

Fragmented Hardware

NVIDIA here, AMD there, Apple Silicon on laptops. No unified way to use them.

Idle Capacity

GPUs sit unused while teams wait for "the good machine" to free up.

Complex Orchestration

Load balancing, failover, model routing — building this yourself takes months.

Cortex Unifies Your Compute

A lightweight coordinator that turns any GPU into part of your inference cluster. Deploy workers anywhere, route requests intelligently, get results reliably.

Your Applications
APIs, Services, Frontends
OpenAI-compatible API
Cortex Coordinator
Routing • Quorum • Caching • Reputation
gRPC / HTTP
NVIDIA
CUDA
AMD
ROCm
Intel
SYCL
Apple
Metal
GPU Workers

Built for Real Workloads

Hardware Agnostic

NVIDIA (CUDA), AMD (ROCm/Vulkan), Intel (SYCL), Apple Silicon (Metal). Mix and match freely.

Quorum Validation

2-of-3 consensus ensures response accuracy. Catch hallucinations before they reach users.

Worker Reputation

Automatic quality tracking. Unreliable workers get deprioritized, good ones get more work.

Response Caching

SHA256-based caching for deterministic queries. Don't recompute what you've already answered.

OpenAI-Compatible API

Drop-in replacement for /v1/chat/completions. Your existing code just works.

Real-time Dashboard

Monitor workers, track throughput, analyze performance — all from a web UI.

Up and Running in Minutes

01

Start the Coordinator

Single Go binary. No dependencies. Run it on any machine in your network.

./cortex --port 3000
02

Connect Workers

Point workers at the coordinator. Each worker registers its capabilities.

python worker.py --mothership coordinator:3000 --model llama-3
03

Send Requests

Use the OpenAI-compatible API. Cortex handles routing and consensus.

curl http://coordinator:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3", "messages": [{"role": "user", "content": "Hello"}]}'

Who Uses Cortex?

ML Teams

We have GPUs on every workstation but no way to share capacity across the team.
  • Pool team hardware into shared inference
  • Stop waiting for "the fast machine"
  • Utilize idle overnight/weekend capacity

Research Labs

Our cluster has mixed hardware from different grant cycles.
  • Unified API across NVIDIA/AMD/Intel
  • Automatic load balancing
  • Reproducible results via quorum

On-Prem Enterprises

We can't send data to cloud APIs but need reliable LLM inference.
  • 100% on-premises deployment
  • No data leaves your network
  • Enterprise-grade reliability

Under the Hood

Coordinator

  • Written in Go for performance and easy deployment
  • Single binary, no runtime dependencies
  • Embedded web dashboard
  • gRPC + REST APIs

Workers

  • Python with llama.cpp backend
  • Automatic hardware detection
  • Hot model loading/unloading
  • Health monitoring and heartbeats

Protocols

  • OpenAI-compatible REST API
  • gRPC for worker communication
  • HTTP polling for quorum (firewall-friendly)

Requirements

  • Coordinator: Any machine, minimal resources
  • Workers: GPU with 8GB+ VRAM recommended
  • Network: HTTP connectivity between nodes

Join the Beta

Cortex v0.7.0 is available now for early adopters. We're looking for teams to help shape the roadmap.

What's Ready

  • Core inference routing
  • Quorum validation
  • Worker reputation
  • Response caching
  • Web dashboard
  • OpenAI-compatible API

Coming Soon

  • CLI client
  • Project-based worker pools
  • Hardware-aware model selection
  • Distributed project pools

Ready to Unify Your GPU Fleet?

Get started with Cortex today. Free and open source.

Apache 2.0 License·Self-hosted·No data leaves your network