Enterprise AI Gateway + Compute Platform
One key. Every model, every GPU, every ComfyUI workflow. With the cost, audit, and reliability your CFO and CTO already asked about.
Used in production by teams shipping image, video, voice, and chat features to millions of end users.
Know what you spend before the invoice arrives.
Every gateway request is priced, attributed, and logged in real time. Per-model dashboards, monthly forecasts, and budget guardrails — out of the box, no Datadog dashboard required.
Cost Dashboard
Daily spend trend, per-model breakdown, top-10 most expensive requests. The view your CFO actually asked for.
Spend Forecast
Trailing burn rate projected to month-end so you can see overruns weeks before they hit your card.
Budget Alerts
Per-key monthly cap. Emails at 80% and 100% with a cooldown so you don't get spammed. Optional auto-pause kills runaway loops dead.
Signed Webhooks
HMAC-signed events for spend thresholds, key created, key revoked, generation failed. Wire them into PagerDuty, Slack, or your own ledger.
Outages happen. Your users shouldn't notice.
Multi-provider failover, regional fallback, and intent-aware routing turn a fragile single-vendor dependency into a redundant, self-healing layer.
Multi-Provider Failover
Configurable per-key timeouts and retry policy. On 5xx or timeout, traffic transparently rolls to the next provider in the chain.
POST /v1/chat/completions ├── primary → openai/gpt-4.1-mini [503 in 8s] ✗ ├── fallback 1 → google/gemini-2.5-flash [200 in 612ms] ✓ └── fallback 2 → anthropic/claude-haiku (skipped) served 200 OK · upstream: gemini · total 624ms
Smart Routing
Tell us the intent — fast chat, deep reasoning, image edit, long-form summarization — and we pick the cheapest qualified provider. Pin an exact model when you need to.
Regional Fallback
If a provider's US-East region is degraded, we try US-West, then EU, before failing the request. Region-stickiness is configurable per key.
Per-key controls that satisfy a security review.
Scoped keys, granular rate limits, IP allowlists, immutable audit log, and CSV export. Designed for the questions your CTO and your auditor will both ask.
API Key Scoping
Per-key allow/deny on models, IP allowlist, daily and hourly spend caps. Rotate without redeploying.
Per-Key, Per-Model Rate Limits
RPM and TPM limits scoped to the key and the model. A staging key can't accidentally drain prod's quota.
Immutable Audit Log
Every key created, scope changed, budget moved, or revocation is recorded with actor, IP, and timestamp. SOC2-baseline by default.
Searchable Logs + CSV Export
Filter request logs by endpoint, model, status, latency, key. One-click CSV for finance, compliance, or post-mortem.
Compliance posture
- TLS 1.2+ end-to-end. Keys hashed at rest, never logged in plaintext.
- Per-tenant key + budget isolation. No cross-tenant data leakage.
- Configurable log retention. Drop request bodies on demand for high-sensitivity workloads.
- EU and US routing available on request for residency-sensitive deployments.
- SOC2 controls in scope for 2026. Reach out if you need a current letter from our auditor.
Drop-in for the OpenAI SDK. Swap one base URL.
Hypereal speaks OpenAI Chat Completions, Images, Responses, and Anthropic Messages. Keep your SDK, your prompts, your tool definitions, your retries — change the base URL and the API key, ship.
curl https://api.hypereal.cloud/v1/chat/completions \
-H "Authorization: Bearer $HYPEREAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1-mini",
"messages": [{ "role": "user", "content": "hi" }]
}'import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.HYPEREAL_API_KEY,
baseURL: "https://api.hypereal.cloud/v1",
});
const res = await client.chat.completions.create({
model: "gpt-4.1-mini",
messages: [{ role: "user", content: "hi" }],
});Supported endpoints
- POST /v1/chat/completions — OpenAI-compatible
- POST /v1/messages — Anthropic-compatible
- POST /v1/responses — OpenAI Responses API
- POST /v1/images/generations — OpenAI-compatible
- POST /v1/videos/generate — Hypereal video API
- POST /v1/comfy/{slug} — ComfyUI workflow as API
- POST /v1/gpu/{slug} — Serverless GPU passthrough
Beyond models: compute as a first-class API.
Every team eventually needs more than chat completions — a custom ComfyUI graph, a fine-tune, a one-off GPU job. Hypereal exposes those behind the same key, the same logs, the same budgets.
Serverless GPU Passthrough
Bring your own RunPod handler and call it as POST /v1/gpu/{slug}. We handle auth, metering, retries, and the bill. You write the handler.
ComfyUI Workflow as API
Upload any ComfyUI workflow JSON. We give you a versioned HTTP endpoint with typed inputs and outputs, billed per run. No more pasting graphs in Slack.
ComfyUI Library
A growing catalog of pre-built ComfyUI workflows — face restore, product shot, cinematic upscale — call them like any other model.
LoRA & Asset Repo
Private, versioned storage for LoRAs, checkpoints, embeddings, and reference images. Reference them by handle from any workflow or generation.
POST /v1/comfy/cinematic-upscale
{
"inputs": { "image_url": "https://...", "strength": 0.8 },
"version": "v3"
}
POST /v1/gpu/my-handler
{
"input": { "prompt": "a cat", "steps": 28 }
}Numbers we publish. Not screenshots in a sales deck.
Live status page, transparent latency, and an incident history you can read without asking us first.
Transparent latency
Rolling p50 and p95 for every gateway endpoint, by region.
Uptime history
Trailing 30/90-day uptime, no marketing math. The number is the number.
Stop running 8 vendor dashboards.
One API key, one bill, one place to see what's happening. Get up and running in under five minutes.

