Dokumentacja API Hypereal
Jeden ck_klucz API z prefiksem. Kompatybilne z OpenAI REST. Używaj w Claude Code, Codex CLI, Cursor, OpenAI SDK, Anthropic SDK lub wywołuj bezpośrednio za pomocą curl. Czat, obrazy, wideo, audio, agenci kodu — wszystko pod jednym adresem URL.
Enterprise API uses a separate managed API surface.
This page documents the standard API paths. For managed Enterprise API models, capacity controls, and insurance, use the Enterprise overview and Enterprise API docs.
01 · Rozpocznij w 90 sekund
Szybki start
Wygeneruj klucz, skieruj swojego klienta na hypereal.cloud i działaj. Autoryzacja i kształty żądań są zgodne z OpenAI — większość SDK działa po zmianie tylko podstawowego adresu URL.
Doładuj co najmniej $2 (200 kredytów) i utwórz klucz na stronie /manage-api-keys. Klucze zaczynają się od ck_.
Podstawowy adres URL: https://hypereal.cloud/api/v1
Nagłówek autoryzacji to Authorization: Bearer ck_.... Te same treści żądań OpenAI, które już znasz.
curl https://hypereal.cloud/api/v1/chat/completions \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Say hi in one word."}]
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.HYPEREAL_API_KEY, // ck_...
baseURL: 'https://hypereal.cloud/api/v1',
});
const completion = await client.chat.completions.create({
model: 'gpt-5.5',
messages: [{ role: 'user', content: 'Say hi in one word.' }],
});
console.log(completion.choices[0].message.content);For coding agents, start withclaude-sonnet-4-6and use Claude Code or another Anthropic-compatible client that sendscache_control. Hypereal supportscache_controlcaching and Hypereal Cache. Hypereal Cache is on by default and can sharply reduce token consumption for repeated coding-agent context. You can sethypereal.cacheto"auto"explicitly, or omit it for the same default.
SDK
Hypereal SDK
Install hypereal-sdk for typed access to chat, responses, image generation, video generation, audio, jobs and storage from Node.js 18+.
Published as hypereal-sdk on npm.
Use client.images.generate(), chat, responses, jobs and storage.
See the full SDK overview at /sdk.
pnpm add hypereal-sdk
import { Hypereal } from 'hypereal-sdk';
const client = new Hypereal({
apiKey: process.env.HYPEREAL_API_KEY!,
});
const image = await client.images.generate({
model: 'gemini-3-1-flash-t2i',
prompt: 'A cinematic portrait in neon light',
aspect_ratio: '16:9',
});
console.log(image);const object = await client.storage.uploadFile(file, {
filename: 'training-image.png',
contentType: 'image/png',
kind: 'dataset',
});
const listed = await client.storage.list({ kind: 'dataset' });02
Uwierzytelnianie
Każde żądanie wymaga klucza z prefiksem ck_. Trzy akceptowane formy nagłówków obejmują wszystkie SDK.
Bearer ck_... — używane przez OpenAI SDK, Codex CLI i Cursor.ck_... — używane przez Anthropic SDK i Claude Code na /v1/messages.ck_... — Google Gemini SDK / natywny format, akceptowany przez /v1/gemini.?key=ck_... również działa.03 · Kompatybilny z OpenAI
Uzupełnianie czatu
Główny punkt końcowy. Format przewodowy OpenAI Chat Completions. Używany dla GPT, Gemini, Qwen, DeepSeek, GLM i każdego innego LLM niebędącego Anthropic.
/api/v1/chat/completionsTreść żądania
/v1/messages zamiast tego.role, content).false. Strumień SSE, gdy true; użycie jest uwzględnione w ostatnim fragmencie.Cennik
Rozliczane za token, zgodnie ze stawką wejścia/wyjścia każdego modelu. 100 kredytów = 1,00 $. Minimalne saldo do wywołania punktu końcowego to 200 kredytów (2,00 $).
curl https://hypereal.cloud/api/v1/chat/completions \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You are a terse assistant."},
{"role": "user", "content": "Two-line haiku about caches."}
],
"stream": true,
"max_tokens": 256
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.HYPEREAL_API_KEY,
baseURL: 'https://hypereal.cloud/api/v1',
});
const stream = await client.chat.completions.create({
model: 'gpt-5.5',
stream: true,
messages: [{ role: 'user', content: 'Stream me a haiku.' }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}Modele kompatybilne z OpenAI i dostawcami
gpt-5.5gpt-5.5-instantgpt-5.4gpt-5.4-minideepseek-v4-prodeepseek-v4-flashdeepseek-v3.2kimi-k2.6kimi-k2.5glm-5.1glm-5qwen3-maxqwen3.5-plusqwen3.5-flashMiniMax-M2.504 · Kompatybilny z Anthropic
Wiadomości
Format przewodowy Anthropic /v1/messages z rozszerzonym myśleniem, przełączaniem awaryjnym wielu upstreamów i 15-sekundowymi komunikatami SSE keepalive. Użyj tego dla Claude Code, OpenCode, OpenClaw i oficjalnego SDK Anthropic.
/api/v1/messagesTreść żądania
claude-sonnet-4-6, claude-opus-4-6, lub claude-haiku-4-5. Starsze identyfikatory Anthropic (claude-sonnet-4-5-20250929, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022) są automatycznie aliasowane do najnowszych odpowiedników.system,tools, or text content blocks for Anthropic prompt caching. Hypereal defaults a cache breakpoint when omitted and reports cache usage in response metadata."auto" to make the default explicit for repeated requests, orfalse to bypass it for a request.budget_tokens ogranicza ślad rozumowania. Punkt końcowy wysyła pingi SSE co 15 sekund, aby zapobiec zamykaniu długich strumieni myślenia przez proxy.curl https://api.hypereal.cloud/v1/messages \
-H "x-api-key: ck_..." \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"system": [{
"type": "text",
"text": "You are a senior TypeScript refactoring assistant.",
"cache_control": {"type": "ephemeral"}
}],
"messages": [
{"role": "user", "content": "Plan a 3-step refactor of a Next.js app."}
],
"hypereal": {"cache": "auto"}
}'import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: process.env.HYPEREAL_API_KEY, // ck_...
baseURL: 'https://api.hypereal.cloud',
});
const msg = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: [{
type: 'text',
text: 'You are a senior TypeScript refactoring assistant.',
cache_control: { type: 'ephemeral' },
}],
hypereal: { cache: 'auto' },
messages: [{ role: 'user', content: 'Hello, Claude.' }],
});
console.log(msg.content);Modele Anthropic
claude-opus-4-7claude-sonnet-4-6managed-claude-opus-4-7-maxmanaged-claude-sonnet-4-6-max05 · OpenAI Responses API
Responses
Nowsze API Responses OpenAI (używane przez tryb `wire_api = responses` w Codex CLI oraz OpenAI Agents SDK). To samo uwierzytelnianie co w chat/completions; ciało żądania używa `input` zamiast `messages`.
/api/v1/responsesUwagi
- Modele Anthropic zwracają 400 — należą do
/v1/messages. - Zarówno streaming, jak i brak streamingu są rozliczane z
response.usage.input_tokens/output_tokens. - Niektóre upstreamy zawsze emitują SSE — punkt końcowy wykrywa to i przesyła strumieniowo w sposób przezroczysty, nawet jeśli
stream:false. - Przełączanie awaryjne wielu upstreamów. Ustaw długi limit czasu klienta (300s+).
curl https://hypereal.cloud/api/v1/responses \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.1-codex",
"input": "Write a TypeScript function that debounces a callback.",
"stream": true
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.HYPEREAL_API_KEY,
baseURL: 'https://hypereal.cloud/api/v1',
});
const response = await client.responses.create({
model: 'gpt-5.3-codex',
input: 'Refactor this file into smaller modules.',
});
console.log(response.output_text);Modele dostrojone do Codex
gpt-5.3-codex06 · Codex CLI / Codex Desktop
Codex CLI
Codex wskazuje swojego dostawcę `wire_api = responses` na /api/v1/responses. CLI dodaje `/responses` do podstawowego adresu URL, więc skonfiguruj podstawowy adres URL, jak pokazano.
/api/v1/responses# ~/.codex/config.toml model_provider = "hypereal" model = "gpt-5.3-codex" [model_providers.hypereal] name = "Hypereal" base_url = "https://hypereal.cloud/api/v1" wire_api = "responses" env_key = "HYPEREAL_API_KEY"
Następnie wyeksportuj swój klucz:export HYPEREAL_API_KEY=ck_...
Uruchom codex jak zwykle. Wszystko, co wysyła Codex — pełne strumienie rozumowania, wywołania narzędzi, edycje plików — jest przekazywane bez zmian. Rozliczenia są oparte na standardowym bloku użycia input_tokens / output_tokens blok użycia.
Ta sama konfiguracja działa dla OpenCode, Claude Code (użyj /v1/messages), Cursor (użyj /v1/chat/completions) i Gemini CLI (użyj /v1/gemini).
07
Generowanie obrazów
Kształt /images/generations zgodny z OpenAI. Synchroniczny — punkt końcowy zwraca adresy URL obrazów (lub base64) po zakończeniu działania nadrzędnego. Rozliczane za obraz; `n` jest ograniczane do 1–10.
/api/v1/images/generationsTreść żądania
image, reference_images).1024x1024, 1536x1024. Zależne od dostawcy.creditsPerGeneration × n, punkt końcowy zwraca 402.gpt-image-2, nano_banana_pro, and gemini-3-1-flash-t2i. Use gpt-5.5 only with chat, messages, or responses endpoints.curl https://hypereal.cloud/api/v1/images/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "nano_banana_pro",
"prompt": "isometric studio shot of a tiny cyberpunk apartment, neon rim light",
"n": 1,
"size": "1024x1024"
}'const res = await fetch('https://hypereal.cloud/api/v1/images/generations', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HYPEREAL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'nano_banana_pro',
prompt: 'a chrome teapot floating over the ocean at sunset',
n: 1,
}),
});
const { data } = await res.json();
console.log(data[0].url); // or data[0].b64_json depending on the modelGPT Image 2 — text-to-image & image-to-image
Use the same /api/v1/images/generations endpoint with "model": "gpt-image-2". Pass an array of public image URLs in reference_images to switch from pure text-to-image to image-conditioned generation (edits, restyles, character consistency).
sizeaccepts1024x1024,1536x1024(landscape),1024x1536(portrait),2048x2048,4096x4096. 2K and 4K are square only.- Reference images must be public HTTPS URLs (base64 is not accepted by this model). Up to 4 references per request.
- Pricing is per-tier: 1K, 2K, and 4K each have their own credit cost — see the model table below.
- Synchronous response: the call returns the final image URL (no polling needed). Allow up to ~120 s.
# Text-to-image (1K landscape)
curl https://hypereal.cloud/api/v1/images/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-2",
"prompt": "a chrome teapot floating over the ocean at sunset",
"size": "1536x1024"
}'
# Image-to-image / edit
curl https://hypereal.cloud/api/v1/images/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-2",
"prompt": "same character, snowy mountain background, golden hour",
"size": "1024x1024",
"reference_images": [
"https://example.com/source.jpg"
]
}'NanoBanana 2 — image-to-image & multimodal inputs
Model id gemini-3-1-flash-t2i (NanoBanana 2). Pass references in image_urls to switch into image-to-image / multi-reference mode. Up to 4 reference images, blended in prompt order. Use the standard aspect_ratio field — landscape, portrait, and square are all supported at every resolution tier.
- Supported
aspect_ratio: 1:1, 3:2, 2:3, 4:3, 3:4, 16:9, 9:16, 21:9. - Supported
resolution: 0.5K, 1K, 2K, 4K. - Reference images may be public HTTPS URLs or base64 data URLs.
- Multi-reference works with a text prompt — combine, e.g., a character + outfit + scene reference and describe the final composition in the prompt.
# Multimodal: text + multiple reference images
curl https://hypereal.cloud/api/v1/images/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-1-flash-t2i",
"prompt": "Place the character (img 1) wearing the jacket (img 2) into the scene from img 3, cinematic light",
"aspect_ratio": "16:9",
"resolution": "2K",
"image_urls": [
"https://example.com/character.png",
"https://example.com/jacket.png",
"https://example.com/scene.png"
]
}'Modele obrazów
gpt-image-2gpt-4o-imagenano_banananano_banana_2gemini-3.1-flash-image-previewgemini-2.5-flash-image-previewflux-kontext-proflux-2-prodoubao-seedream-4-0doubao-seedream-4-5doubao-seedream-5-0gemini-3.1-flash-image-preview-officialflux-kontext-maxgemini-2.5-flash-image-officialnano_banana_progemini-3-pro-image-previewflux-2-flexgemini-3-pro-image-preview-officialgemini-3-pro-image-preview-4Kgemini-3.1-fast-imagengemini-3.1-thinking-imagen08 · długotrwałe
Generowanie wideo
Synchroniczny punkt końcowy z długim odpytywaniem — utrzymuj połączenie otwarte, dopóki klip nie będzie gotowy. Ustaw limit czasu klienta HTTP na 600s. Rozliczanie odbywa się za sekundę (większość modeli) lub za klip (Veo, Vidu, Grok).
/api/v1/videos/generateTreść żądania
per_second modeli.16:9, 9:16, 1:1. Zależne od dostawcy.Gemini Omni Flash accepts 16:9 or 9:16.720P.last_image_url lub image — zapoznaj się z dokumentacją nadrzędną dla tego modelu.curl https://hypereal.cloud/api/v1/videos/generate \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gemini_omni_flash",
"prompt": "a white cube rotating on a black background, clean product demo",
"duration": 6,
"aspect_ratio": "16:9",
"resolution": "720P",
"image_urls": [
"https://example.com/product-reference.png"
]
}'const res = await fetch('https://hypereal.cloud/api/v1/videos/generate', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HYPEREAL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gemini_omni_flash',
prompt: 'a cat walking on the moon, cinematic, no text',
duration: 6,
aspect_ratio: '16:9',
resolution: '720P',
image_urls: ['https://example.com/cat-reference.png'],
}),
});
const data = await res.json();
console.log(data.jobId, data.pollUrl); // poll /v1/jobs/{id} for the mp4Modele wideo
happyhorse-1.0gemini_omni_flashwan2.6-flashkling-2-6MiniMax-Hailuo-02doubao-seedance-1-0-pro-fastMiniMax-Hailuo-2.3wan2.6kling-video-o1kling-v3-omnikling-v3kling-v3-videodoubao-seedance-1-0-pro-qualitydoubao-seedance-2-0doubao-seedance-2-0-fastdoubao-seedance-1-5-proVeo3.1-fast-officialVeo3.1-quality-officialveo3.1-fastveo3.1-qualityvidu-q3-progrok-video-309 · Fish Audio
Audio — TTS, klonowanie głosu, ASR
Trzy identyfikatory modeli współdzielą jeden punkt końcowy. Kształt ciała i odpowiedzi zależy od tego, który z nich wywołasz. Dostawcą jest Fish Audio (wywoływane bezpośrednio, nie przez ToAPI), rozliczane za żądanie.
/api/v1/audio/generationsaudio-tts i audio-clone.audio-asr (wejście) i audio-clone (głos referencyjny ≥ 10s).data: [{ url }] dla TTS / klonowania, text (+ opcjonalnie segments, duration) dla ASR.curl https://hypereal.cloud/api/v1/audio/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "audio-tts",
"text": "Welcome to Hypereal. One key, every model.",
"voice_id": "en_male_calm"
}'curl https://hypereal.cloud/api/v1/audio/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "audio-clone",
"text": "This is my cloned voice.",
"audio": "https://example.com/reference-30s.mp3"
}'curl https://hypereal.cloud/api/v1/audio/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "audio-asr",
"audio": "https://example.com/recording.mp3"
}'Modele audio
audio-ttsaudio-cloneaudio-asr10 · Natywny kształt Google
Gemini
Akceptuje zarówno natywne kształty Gemini (`contents` / `generationConfig` / `systemInstruction`), jak i OpenAI na tym samym punkcie końcowym. Punkt końcowy konwertuje wewnętrznie na OpenAI przed przekazaniem. Dla większości kodu, /v1/chat/completions z identyfikatorem modelu Gemini jest prostsze.
/api/v1/geminitemperature, maxOutputTokens itd.contents.Nagłówek autoryzacji: x-goog-api-key: ck_..., ?key=ck_..., lub Authorization: Bearer ck_... wszystkie działają.
curl "https://hypereal.cloud/api/v1/gemini" \
-H "x-goog-api-key: ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.5-thinking",
"contents": [
{"role": "user", "parts": [{"text": "Outline a launch plan."}]}
],
"generationConfig": {"temperature": 0.6, "maxOutputTokens": 2048}
}'// The /v1/gemini endpoint accepts both Gemini-native and OpenAI shapes.
// For SDK use, the OpenAI client + /v1/chat/completions is simpler.
const res = await fetch('https://hypereal.cloud/api/v1/gemini', {
method: 'POST',
headers: {
'x-goog-api-key': process.env.HYPEREAL_API_KEY!,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gemini-3.5-fast',
contents: [{ role: 'user', parts: [{ text: 'Hi' }] }],
}),
});
console.log(await res.json());Modele Gemini
gemini-3.1-pro-previewgemini-3-pro-previewgemini-3-flash-preview11
Błędy i limity zapytań
Wszystkie błędy są w formacie JSON: '{ error: { type, message } }'. Limity zapytań są oceniane na użytkownika, a nie na klucz — wiele kluczy współdzieli tę samą kwotę.
ck_ ), wygasły lub nieaktywny klucz.X-RateLimit-Limit, X-RateLimit-Remaining, oraz X-RateLimit-Reset nagłówki są zwracane w odpowiedziach na limity zapytań.model, nieznany identyfikator modelu (odpowiedź zawiera available_models) lub niewłaściwy punkt końcowy dla formatu (np. model Anthropic na /chat/completions).DEVELOPER
ComfyUI as API
Deploy a ComfyUI container as a Hypereal-managed GPU endpoint. Same per-second billing, auto-scaling, webhook delivery as any other deployment — you control the workflow graph and the model weights.
/comfy workflow-JSON paster and /v1/comfy/* routes were retired. ComfyUI now ships as a regular Deployment — you bring a Docker image (e.g. runpod/worker-comfyui or your own), we mount it on real GPUs./v1/gpu/run/{slug}Submits a job to your ComfyUI deployment. Async by default; pass "sync": true to wait inline up to 240s.
curl -X POST https://hypereal.cloud/v1/gpu/run/my-comfy-workflow \
-H "Authorization: Bearer $HYPEREAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": {
"prompt": "a cinematic portrait of an astronaut",
"seed": 42,
"workflow_overrides": { "Sampler.steps": 30 }
}
}'{
"job_id": "K3uA7Pq9xLm4",
"status": "queued",
"provider_job_id": "..."
}/v1/gpu/jobs/{id}Poll for status. We live-poll the worker on each request so you see queued → running → succeeded in near real time. On succeeded credits settle to the actual GPU-seconds; on failed we refund the hold. Pin a webhookUrl on the deployment to skip polling.
{
"job_id": "K3uA7Pq9xLm4",
"status": "succeeded",
"output": { "images": ["data:image/png;base64,..."] },
"executionMs": 18420,
"creditsCharged": 56
}# List
curl https://hypereal.cloud/v1/deployments \
-H "Authorization: Bearer $HYPEREAL_API_KEY"
# Create (point at any ComfyUI worker image)
curl -X POST https://hypereal.cloud/v1/deployments \
-H "Authorization: Bearer $HYPEREAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"slug": "my-comfy-workflow",
"name": "My Comfy",
"dockerImage": "runpod/worker-comfyui:dev-cuda12.1.1",
"gpuTypes": "ADA_48_PRO,AMPERE_80"
}'Open /infra/deployments/new: pick a GPU tier, point at your ComfyUI Docker image (custom builds with your weights and custom nodes pre-baked work fine), set min/max workers and idle timeout. Your endpoint goes live in 60s.
Full Infrastructure docs: /docs/infra — handler spec, pricing, webhook protocol, R2 storage for weights.
ENTERPRISE
Gateway features
Cost visibility, budget guardrails, request logs, multi-provider failover, and smart routing — all built into the same API key. No extra setup, no separate dashboard tier.
Spend, by model, in real time
Per-model pie, daily cost trend, top-10 most expensive requests. Available on every account at /usage. Export the underlying logs to CSV at any time:
GET /api/api-usage/export?days=30 Authorization: session cookie → hypereal-usage-2026-05-10.csv
Per-key monthly cap, with email guardrails
Set spendingLimit on any API key. We email at 80% (heads up) and 100% (hard cap). Optional: auto-disable the key on overshoot so a runaway loop never costs you a four-figure invoice.
POST /api/api-keys
{
"name": "prod-eu",
"spendingLimit": 50000 // 500 USD / month
}Every call, searchable
Every API call is indexed by endpoint, model, status code, latency, and cost. Filter and search at /usage, or pull the JSON directly:
GET /api/api-usage?days=30&limit=1000
{
logs: [...],
costByModel: [...],
topExpensiveRequests: [...]
}Outages don't reach your users
Every supported model has a fallback chain. On 5xx, timeout, or 429 we transparently retry the next provider with exponential backoff. You always get a result or a single, clean error — never a flap.
primary: seedance-2-0-turbo-t2v (region us-east) fallback: seedance-2-0-t2v (region us-west) fallback: seedance-2-0 (region eu-central) retries: 1 per target, exp backoff
Pick by intent, we pick the cheapest qualified model
Send intent instead of model and we'll route to the cheapest provider in that capability bucket — without giving up determinism: pin a model whenever you want and we'll honor it exactly.
POST /v1/images/generate
{
"intent": "text-to-image-fast", // ← we'll pick the cheapest qualified model
"prompt": "a quiet sunrise over Mt Fuji"
}
# Or pin explicitly:
{ "model": "nano-banana-t2i", "prompt": "..." }SERVERLESS
GPU models
Hosted serverless GPU inference at /v1/gpu/{slug}. One API key, credit billing, audit log, and webhooks. Same wallet and dashboard as your LLM calls.
1. Pick a model
Browse the live catalog at /gpu-recommend. Each model lists its slug, per-call or per-second credit cost, and the maximum execution time per call.
2. Sync invocation (small jobs)
Short-running models return the output inline.
curl -X POST https://api.hypereal.cloud/v1/gpu/sdxl \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{"input": {"prompt": "a tabby cat astronaut"}}'
→ { "id": "...",
"status": "succeeded",
"outputs": ["https://cdn.hypereal.cloud/gpu/.../out.png"],
"costCredits": 50,
"durationMs": 4210 }3. Async invocation (long jobs)
Long-running models queue and return a job id immediately with a 202. Poll, or wait for our cron + webhook poller to settle the job.
# Submit
POST /v1/gpu/wan-video
{ "input": { "prompt": "drone over Tokyo, neon, rain", "seconds": 5 } }
→ 202 { "id": "abc...", "status": "queued", "pollUrl": "/v1/gpu/jobs/abc..." }
# Poll
GET /v1/gpu/jobs/abc...
→ { "id": "abc...",
"status": "succeeded",
"outputs": ["https://cdn.hypereal.cloud/gpu/.../clip.mp4"],
"costCredits": 312,
"durationMs": 156000 }Failed and timed-out jobs auto-refund the credit reservation. Per-second billing reconciles on completion using the model's reported execution time, capped at the model'smaxSeconds.
ENTERPRISE
Teams, RBAC & SSO
Organizations, five built-in roles, SAML and OIDC single sign-on. Built so security and procurement can sign off without a custom rider.
Org-scoped keys, audit log, billing
Every API key, webhook, ComfyUI workflow, and GPU template can belong to an organization instead of an individual. Teammates share one budget, one audit trail, and one invoice. Personal keys keep working alongside.
POST /api/orgs
{
"name": "Acme Inc"
}
→ { id, slug, role: "owner" }Owner · Admin · Developer · Billing · Viewer
- Owner — everything, including delete-org
- Admin — manage members, keys, SSO, webhooks
- Developer — create/delete API keys, manage workflows + GPUs
- Billing — view + manage payments and audit log
- Viewer — read-only access to keys, billing, audit
Configure your IdP in 3 steps
- Create a SAML app in Okta / Azure AD / Auth0 / Google.
- Set ACS URL to
https://hypereal.cloud/api/auth/sso/<providerId> - Paste the IdP metadata XML into /settings/organization → SSO.
Set the email-domain claim (e.g. acme.com) and the login form will auto-route corporate emails to your IdP — no password prompt.
Issuer + client credentials
Drop in your issuer URL, client id, and client secret. We fetch the/.well-known/openid-configuration on save and surface a green check when the IdP is reachable.
POST /api/orgs/{id}/sso
{
"type": "oidc",
"issuer": "https://idp.acme.com",
"clientId": "...",
"clientSecret": "...",
"domain": "acme.com"
}12
Cennik i kredyty
Jedna jednostka: 100 kredytów = 1,00 USD. Modele LLM rozliczają się za token, używając stawki wejścia/wyjścia każdego modelu. Modele multimedialne rozliczają się za obraz, za sekundę lub za klip.
LLM-y
Tokeny × stawka za MToken. Żądania strumieniowe są rozliczane na podstawie końcowego fragmentu użycia.
Obrazy
Stała opłata za generowanie × rzeczywista liczba n zwróconych.
Wideo i audio
Za sekundę (większość wideo), za klip (Veo, Vidu, Grok) lub za żądanie (Fish Audio).
Claude, GPT, Gemini i wybrane modele obrazów (GPT Image 2, Nano Banana) są wyceniane bezpośrednio przez dostawców. Modele wideo, audio i inne modele multimedialne są rozliczane według standardowych stawek.

