Hypereal API 레퍼런스
하나의 ck_ 접두사 API 키. OpenAI 호환 REST. Claude Code, Codex CLI, Cursor, OpenAI SDK, Anthropic SDK에 그대로 사용하시거나 curl로 직접 호출하실 수 있습니다. 채팅, 이미지, 비디오, 오디오, 코드 에이전트 — 모두 하나의 베이스 URL 뒤에 있습니다.
Enterprise API uses a separate managed API surface.
This page documents the standard API paths. For managed Enterprise API models, capacity controls, and insurance, use the Enterprise overview and Enterprise API docs.
01 · 90초 만에 시작
빠른 시작
키를 발급받고, 클라이언트를 hypereal.cloud로 지정한 후 출시하십시오. 인증과 요청 형식은 OpenAI 호환이므로 대부분의 SDK는 베이스 URL만 변경하면 작동합니다.
최소 $2 (200 크레딧)을 충전하시고 다음에서 키를 생성하십시오: /manage-api-keys. 키는 다음으로 시작합니다: ck_.
베이스 URL: https://hypereal.cloud/api/v1
인증 헤더는 Authorization: Bearer ck_...입니다. 익숙하신 OpenAI 요청 본문을 그대로 사용하실 수 있습니다.
curl https://hypereal.cloud/api/v1/chat/completions \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Say hi in one word."}]
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.HYPEREAL_API_KEY, // ck_...
baseURL: 'https://hypereal.cloud/api/v1',
});
const completion = await client.chat.completions.create({
model: 'gpt-5.5',
messages: [{ role: 'user', content: 'Say hi in one word.' }],
});
console.log(completion.choices[0].message.content);For coding agents, start withclaude-sonnet-4-6and use Claude Code or another Anthropic-compatible client that sendscache_control. Hypereal supportscache_controlcaching and Hypereal Cache. Hypereal Cache is on by default and can sharply reduce token consumption for repeated coding-agent context. You can sethypereal.cacheto"auto"explicitly, or omit it for the same default.
SDK
Hypereal SDK
Install hypereal-sdk for typed access to chat, responses, image generation, video generation, audio, jobs and storage from Node.js 18+.
Published as hypereal-sdk on npm.
Use client.images.generate(), chat, responses, jobs and storage.
See the full SDK overview at /sdk.
pnpm add hypereal-sdk
import { Hypereal } from 'hypereal-sdk';
const client = new Hypereal({
apiKey: process.env.HYPEREAL_API_KEY!,
});
const image = await client.images.generate({
model: 'gemini-3-1-flash-t2i',
prompt: 'A cinematic portrait in neon light',
aspect_ratio: '16:9',
});
console.log(image);const object = await client.storage.uploadFile(file, {
filename: 'training-image.png',
contentType: 'image/png',
kind: 'dataset',
});
const listed = await client.storage.list({ kind: 'dataset' });02
인증
모든 요청에는 ck_ 접두사 키가 필요합니다. 세 가지 헤더 형식을 지원하여 모든 SDK를 커버합니다.
Bearer ck_... — OpenAI SDK, Codex CLI, Cursor에서 사용됩니다.ck_... — Anthropic SDK 및 Claude Code에서 사용됩니다 (다음 환경에서): /v1/messages.ck_... — Google Gemini SDK / 네이티브 형식에 사용됩니다 (수신: /v1/gemini.?key=ck_... 도 작동합니다).03 · OpenAI 호환
Chat Completions
주력 엔드포인트입니다. OpenAI Chat Completions 와이어 형식을 사용합니다. GPT, Gemini, Qwen, DeepSeek, GLM 및 Anthropic 이외의 모든 LLM에 사용됩니다.
/api/v1/chat/completions요청 본문
/v1/messages 대신 사용합니다.role, content).false입니다. 다음일 때 SSE 스트림이 사용됩니다: true; usage는 마지막 청크에 포함됩니다.요금
각 모델의 입력/출력 단가에 따라 토큰당 과금됩니다. 100 크레딧 = $1.00. 엔드포인트를 호출하기 위한 최소 잔액은 200 크레딧 ($2.00)입니다.
curl https://hypereal.cloud/api/v1/chat/completions \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You are a terse assistant."},
{"role": "user", "content": "Two-line haiku about caches."}
],
"stream": true,
"max_tokens": 256
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.HYPEREAL_API_KEY,
baseURL: 'https://hypereal.cloud/api/v1',
});
const stream = await client.chat.completions.create({
model: 'gpt-5.5',
stream: true,
messages: [{ role: 'user', content: 'Stream me a haiku.' }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}OpenAI 및 공급자 호환 모델
gpt-5.5gpt-5.5-instantgpt-5.5-progpt-5.4gpt-5.4-minideepseek-v4-prodeepseek-v4-flashdeepseek-v3.2kimi-k2.6kimi-k2.5glm-5.1glm-5qwen3-maxqwen3.5-plusqwen3.5-flashMiniMax-M2.504 · Anthropic 호환
Messages
Anthropic /v1/messages 와이어 형식이며, 확장 사고(extended thinking), 다중 업스트림 페일오버, 15초 SSE keepalive를 지원합니다. Claude Code, OpenCode, OpenClaw, 공식 Anthropic SDK에 사용하십시오.
/api/v1/messages요청 본문
claude-sonnet-4-6, claude-opus-4-6, 또는 claude-haiku-4-5. 이전 Anthropic ID (claude-sonnet-4-5-20250929, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022)는 자동으로 최신 동등 모델로 매핑됩니다.system,tools, or text content blocks for Anthropic prompt caching. Hypereal defaults a cache breakpoint when omitted and reports cache usage in response metadata."auto" to make the default explicit for repeated requests, orfalse to bypass it for a request.budget_tokens 는 추론 트레이스의 상한을 설정합니다. 엔드포인트는 15초마다 SSE 핑을 전송하여 긴 사고 스트림 중 프록시가 연결을 끊지 않도록 합니다.curl https://api.hypereal.cloud/v1/messages \
-H "x-api-key: ck_..." \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"system": [{
"type": "text",
"text": "You are a senior TypeScript refactoring assistant.",
"cache_control": {"type": "ephemeral"}
}],
"messages": [
{"role": "user", "content": "Plan a 3-step refactor of a Next.js app."}
],
"hypereal": {"cache": "auto"}
}'import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: process.env.HYPEREAL_API_KEY, // ck_...
baseURL: 'https://api.hypereal.cloud',
});
const msg = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: [{
type: 'text',
text: 'You are a senior TypeScript refactoring assistant.',
cache_control: { type: 'ephemeral' },
}],
hypereal: { cache: 'auto' },
messages: [{ role: 'user', content: 'Hello, Claude.' }],
});
console.log(msg.content);Anthropic 모델
claude-opus-4-7claude-opus-4-6claude-sonnet-4-6claude-haiku-4-5managed-claude-opus-4-7-maxmanaged-claude-opus-4-6-maxmanaged-claude-opus-4-5-maxmanaged-claude-sonnet-4-6-maxmanaged-claude-sonnet-4-5-maxmanaged-claude-haiku-4-5-max05 · OpenAI Responses API
Responses
OpenAI의 새로운 Responses API입니다 (Codex CLI의 `wire_api = responses` 모드 및 OpenAI Agents SDK에서 사용됨). chat/completions와 동일한 인증을 사용하며, 요청 본문에서는 `messages` 대신 `input`을 사용합니다.
/api/v1/responses비고
- Anthropic 모델은 400을 반환합니다 — 다음 엔드포인트에 속합니다:
/v1/messages. - 스트리밍과 비스트리밍 모두 다음에서 과금됩니다:
response.usage.input_tokens/output_tokens. - 일부 업스트림은 항상 SSE를 내보냅니다 — 엔드포인트가 이를 감지하여, 다음일 때라도 투명하게 스트리밍합니다:
stream:false. - 다중 업스트림 페일오버를 지원합니다. 클라이언트 타임아웃을 길게(300초 이상) 설정하십시오.
curl https://hypereal.cloud/api/v1/responses \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.1-codex",
"input": "Write a TypeScript function that debounces a callback.",
"stream": true
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.HYPEREAL_API_KEY,
baseURL: 'https://hypereal.cloud/api/v1',
});
const response = await client.responses.create({
model: 'gpt-5.3-codex',
input: 'Refactor this file into smaller modules.',
});
console.log(response.output_text);Codex 최적화 모델
gpt-5.3-codexgpt-5.3-codex-spark06 · Codex CLI / Codex Desktop
Codex CLI
Codex는 `wire_api = responses` 공급자를 /api/v1/responses로 지정합니다. CLI는 베이스 URL 앞에 `/responses`를 자동으로 붙이므로, 베이스 URL을 표시된 대로 설정하십시오.
/api/v1/responses# ~/.codex/config.toml model_provider = "hypereal" model = "gpt-5.3-codex" [model_providers.hypereal] name = "Hypereal" base_url = "https://hypereal.cloud/api/v1" wire_api = "responses" env_key = "HYPEREAL_API_KEY"
그런 다음 키를 export 하십시오:export HYPEREAL_API_KEY=ck_...
다음을 평소와 같이 실행하십시오: codex . Codex가 보내는 모든 것 — 전체 추론 스트림, 도구 호출, 파일 편집 — 이 변경 없이 프록시됩니다. 과금은 표준 input_tokens / output_tokens usage 블록을 기준으로 합니다.
동일한 설정이 OpenCode, Claude Code (다음 사용: /v1/messages), Cursor (다음 사용: /v1/chat/completions), Gemini CLI (다음 사용: /v1/gemini)에서도 작동합니다.
07
이미지 생성
OpenAI 호환 /images/generations 형식입니다. 동기식 — 업스트림이 완료되면 엔드포인트가 이미지 URL(또는 base64)을 반환합니다. 이미지당 과금되며, `n`은 1–10으로 클램프됩니다.
/api/v1/images/generations요청 본문
image, reference_images).1024x1024, 1536x1024). 공급자에 따라 다릅니다.creditsPerGeneration × n을 충당할 수 없는 경우, 엔드포인트는 402를 반환합니다.gpt-image-2, nano_banana_pro, and gemini-3-1-flash-t2i. Use gpt-5.5 only with chat, messages, or responses endpoints.curl https://hypereal.cloud/api/v1/images/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "nano_banana_pro",
"prompt": "isometric studio shot of a tiny cyberpunk apartment, neon rim light",
"n": 1,
"size": "1024x1024"
}'const res = await fetch('https://hypereal.cloud/api/v1/images/generations', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HYPEREAL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'nano_banana_pro',
prompt: 'a chrome teapot floating over the ocean at sunset',
n: 1,
}),
});
const { data } = await res.json();
console.log(data[0].url); // or data[0].b64_json depending on the modelGPT Image 2 — text-to-image & image-to-image
Use the same /api/v1/images/generations endpoint with "model": "gpt-image-2". Pass an array of public image URLs in reference_images to switch from pure text-to-image to image-conditioned generation (edits, restyles, character consistency).
sizeaccepts1024x1024,1536x1024(landscape),1024x1536(portrait),2048x2048,4096x4096. 2K and 4K are square only.- Reference images must be public HTTPS URLs (base64 is not accepted by this model). Up to 4 references per request.
- Pricing is per-tier: 1K, 2K, and 4K each have their own credit cost — see the model table below.
- Synchronous response: the call returns the final image URL (no polling needed). Allow up to ~120 s.
# Text-to-image (1K landscape)
curl https://hypereal.cloud/api/v1/images/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-2",
"prompt": "a chrome teapot floating over the ocean at sunset",
"size": "1536x1024"
}'
# Image-to-image / edit
curl https://hypereal.cloud/api/v1/images/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-2",
"prompt": "same character, snowy mountain background, golden hour",
"size": "1024x1024",
"reference_images": [
"https://example.com/source.jpg"
]
}'NanoBanana 2 — image-to-image & multimodal inputs
Model id gemini-3-1-flash-t2i (NanoBanana 2). Pass references in image_urls to switch into image-to-image / multi-reference mode. Up to 4 reference images, blended in prompt order. Use the standard aspect_ratio field — landscape, portrait, and square are all supported at every resolution tier.
- Supported
aspect_ratio: 1:1, 3:2, 2:3, 4:3, 3:4, 16:9, 9:16, 21:9. - Supported
resolution: 0.5K, 1K, 2K, 4K. - Reference images may be public HTTPS URLs or base64 data URLs.
- Multi-reference works with a text prompt — combine, e.g., a character + outfit + scene reference and describe the final composition in the prompt.
# Multimodal: text + multiple reference images
curl https://hypereal.cloud/api/v1/images/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-1-flash-t2i",
"prompt": "Place the character (img 1) wearing the jacket (img 2) into the scene from img 3, cinematic light",
"aspect_ratio": "16:9",
"resolution": "2K",
"image_urls": [
"https://example.com/character.png",
"https://example.com/jacket.png",
"https://example.com/scene.png"
]
}'이미지 모델
gpt-image-2gpt-4o-imagenano_banananano_banana_2gemini-3.1-flash-image-previewgemini-2.5-flash-image-previewflux-kontext-proflux-2-prodoubao-seedream-4-0doubao-seedream-4-5doubao-seedream-5-0gemini-3.1-flash-image-preview-officialflux-kontext-maxgemini-2.5-flash-image-officialnano_banana_progemini-3-pro-image-previewflux-2-flexgemini-3-pro-image-preview-officialgemini-3-pro-image-preview-4Kgemini-3.1-fast-imagengemini-3.1-thinking-imagen08 · 장기 실행
비디오 생성
동기식 long-poll 엔드포인트입니다 — 클립이 준비될 때까지 연결을 유지하십시오. HTTP 클라이언트 타임아웃을 600초로 설정하십시오. 과금은 초당(대부분 모델) 또는 클립당(Veo, Vidu, Grok)으로 이루어집니다.
/api/v1/videos/generate요청 본문
per_second 모델입니다.16:9, 9:16, 1:1. 공급자에 따라 다릅니다.Gemini Omni Flash accepts 16:9 or 9:16.720P.last_image_url 또는 image — 해당 모델에 대해서는 업스트림 문서를 참조하십시오.curl https://hypereal.cloud/api/v1/videos/generate \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gemini_omni_flash",
"prompt": "a white cube rotating on a black background, clean product demo",
"duration": 6,
"aspect_ratio": "16:9",
"resolution": "720P",
"image_urls": [
"https://example.com/product-reference.png"
]
}'const res = await fetch('https://hypereal.cloud/api/v1/videos/generate', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HYPEREAL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gemini_omni_flash',
prompt: 'a cat walking on the moon, cinematic, no text',
duration: 6,
aspect_ratio: '16:9',
resolution: '720P',
image_urls: ['https://example.com/cat-reference.png'],
}),
});
const data = await res.json();
console.log(data.jobId, data.pollUrl); // poll /v1/jobs/{id} for the mp4비디오 모델
happyhorse-1.0gemini_omni_flashwan2.6-flashkling-2-6MiniMax-Hailuo-02doubao-seedance-1-0-pro-fastMiniMax-Hailuo-2.3wan2.6kling-video-o1kling-v3-omnikling-v3kling-v3-videodoubao-seedance-1-0-pro-qualitydoubao-seedance-2-0doubao-seedance-2-0-fastdoubao-seedance-1-5-proVeo3.1-fast-officialVeo3.1-quality-officialveo3.1-fastveo3.1-qualityvidu-q3-progrok-video-309 · Fish Audio
오디오 — TTS, 음성 클로닝, ASR
세 개의 모델 ID가 하나의 엔드포인트를 공유합니다. 본문 및 응답의 형식은 호출하시는 모델에 따라 달라집니다. 공급자는 Fish Audio (ToAPI를 거치지 않고 직접 호출)이며, 요청당 과금됩니다.
/api/v1/audio/generationsaudio-tts 및 audio-clone.audio-asr (입력) 및 audio-clone (참조 음성, 10초 이상).data: [{ url }] 는 TTS / 클로닝용, text (+ 선택적 segments, duration)는 ASR용입니다.curl https://hypereal.cloud/api/v1/audio/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "audio-tts",
"text": "Welcome to Hypereal. One key, every model.",
"voice_id": "en_male_calm"
}'curl https://hypereal.cloud/api/v1/audio/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "audio-clone",
"text": "This is my cloned voice.",
"audio": "https://example.com/reference-30s.mp3"
}'curl https://hypereal.cloud/api/v1/audio/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "audio-asr",
"audio": "https://example.com/recording.mp3"
}'오디오 모델
audio-ttsaudio-cloneaudio-asr10 · Google 네이티브 형식
Gemini
동일한 엔드포인트에서 Gemini 네이티브(`contents` / `generationConfig` / `systemInstruction`)와 OpenAI 형식을 모두 받습니다. 엔드포인트는 내부적으로 OpenAI로 변환한 후 전달합니다. 대부분의 코드에서는 Gemini 모델 ID로 /v1/chat/completions를 사용하시는 편이 더 간단합니다.
/api/v1/geminitemperature, maxOutputTokens 등.contents.인증 헤더: x-goog-api-key: ck_..., ?key=ck_..., 또는 Authorization: Bearer ck_... 모두 작동합니다.
curl "https://hypereal.cloud/api/v1/gemini" \
-H "x-goog-api-key: ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.5-thinking",
"contents": [
{"role": "user", "parts": [{"text": "Outline a launch plan."}]}
],
"generationConfig": {"temperature": 0.6, "maxOutputTokens": 2048}
}'// The /v1/gemini endpoint accepts both Gemini-native and OpenAI shapes.
// For SDK use, the OpenAI client + /v1/chat/completions is simpler.
const res = await fetch('https://hypereal.cloud/api/v1/gemini', {
method: 'POST',
headers: {
'x-goog-api-key': process.env.HYPEREAL_API_KEY!,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gemini-3.5-fast',
contents: [{ role: 'user', parts: [{ text: 'Hi' }] }],
}),
});
console.log(await res.json());Gemini 모델
gemini-3.5-thinkinggemini-3.5-fastgemini-3.1-pro-previewgemini-3-pro-previewgemini-3-flash-preview11
오류 및 속도 제한
모든 오류는 '{ error: { type, message } }' 형식의 JSON입니다. 속도 제한은 키별이 아닌 사용자별로 평가됩니다 — 여러 키가 동일한 할당량을 공유합니다.
ck_ 접두사 없음), 만료되었거나, 비활성화된 경우입니다.X-RateLimit-Limit, X-RateLimit-Remaining, 및 X-RateLimit-Reset 헤더가 속도 제한 응답에 반환됩니다.model, 알 수 없는 모델 ID (응답에 다음 포함: available_models), 또는 형식에 잘못된 엔드포인트 (예: 다음에 사용된 Anthropic 모델: /chat/completions).DEVELOPER
ComfyUI as API
Deploy a ComfyUI container as a Hypereal-managed GPU endpoint. Same per-second billing, auto-scaling, webhook delivery as any other deployment — you control the workflow graph and the model weights.
/comfy workflow-JSON paster and /v1/comfy/* routes were retired. ComfyUI now ships as a regular Deployment — you bring a Docker image (e.g. runpod/worker-comfyui or your own), we mount it on real GPUs./v1/gpu/run/{slug}Submits a job to your ComfyUI deployment. Async by default; pass "sync": true to wait inline up to 240s.
curl -X POST https://hypereal.cloud/v1/gpu/run/my-comfy-workflow \
-H "Authorization: Bearer $HYPEREAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": {
"prompt": "a cinematic portrait of an astronaut",
"seed": 42,
"workflow_overrides": { "Sampler.steps": 30 }
}
}'{
"job_id": "K3uA7Pq9xLm4",
"status": "queued",
"provider_job_id": "..."
}/v1/gpu/jobs/{id}Poll for status. We live-poll the worker on each request so you see queued → running → succeeded in near real time. On succeeded credits settle to the actual GPU-seconds; on failed we refund the hold. Pin a webhookUrl on the deployment to skip polling.
{
"job_id": "K3uA7Pq9xLm4",
"status": "succeeded",
"output": { "images": ["data:image/png;base64,..."] },
"executionMs": 18420,
"creditsCharged": 56
}# List
curl https://hypereal.cloud/v1/deployments \
-H "Authorization: Bearer $HYPEREAL_API_KEY"
# Create (point at any ComfyUI worker image)
curl -X POST https://hypereal.cloud/v1/deployments \
-H "Authorization: Bearer $HYPEREAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"slug": "my-comfy-workflow",
"name": "My Comfy",
"dockerImage": "runpod/worker-comfyui:dev-cuda12.1.1",
"gpuTypes": "ADA_48_PRO,AMPERE_80"
}'Open /infra/deployments/new: pick a GPU tier, point at your ComfyUI Docker image (custom builds with your weights and custom nodes pre-baked work fine), set min/max workers and idle timeout. Your endpoint goes live in 60s.
Full Infrastructure docs: /docs/infra — handler spec, pricing, webhook protocol, R2 storage for weights.
ENTERPRISE
Gateway features
Cost visibility, budget guardrails, request logs, multi-provider failover, and smart routing — all built into the same API key. No extra setup, no separate dashboard tier.
Spend, by model, in real time
Per-model pie, daily cost trend, top-10 most expensive requests. Available on every account at /usage. Export the underlying logs to CSV at any time:
GET /api/api-usage/export?days=30 Authorization: session cookie → hypereal-usage-2026-05-10.csv
Per-key monthly cap, with email guardrails
Set spendingLimit on any API key. We email at 80% (heads up) and 100% (hard cap). Optional: auto-disable the key on overshoot so a runaway loop never costs you a four-figure invoice.
POST /api/api-keys
{
"name": "prod-eu",
"spendingLimit": 50000 // 500 USD / month
}Every call, searchable
Every API call is indexed by endpoint, model, status code, latency, and cost. Filter and search at /usage, or pull the JSON directly:
GET /api/api-usage?days=30&limit=1000
{
logs: [...],
costByModel: [...],
topExpensiveRequests: [...]
}Outages don't reach your users
Every supported model has a fallback chain. On 5xx, timeout, or 429 we transparently retry the next provider with exponential backoff. You always get a result or a single, clean error — never a flap.
primary: seedance-2-0-turbo-t2v (region us-east) fallback: seedance-2-0-t2v (region us-west) fallback: seedance-2-0 (region eu-central) retries: 1 per target, exp backoff
Pick by intent, we pick the cheapest qualified model
Send intent instead of model and we'll route to the cheapest provider in that capability bucket — without giving up determinism: pin a model whenever you want and we'll honor it exactly.
POST /v1/images/generate
{
"intent": "text-to-image-fast", // ← we'll pick the cheapest qualified model
"prompt": "a quiet sunrise over Mt Fuji"
}
# Or pin explicitly:
{ "model": "nano-banana-t2i", "prompt": "..." }SERVERLESS
GPU models
Hosted serverless GPU inference at /v1/gpu/{slug}. One API key, credit billing, audit log, and webhooks. Same wallet and dashboard as your LLM calls.
1. Pick a model
Browse the live catalog at /gpu-recommend. Each model lists its slug, per-call or per-second credit cost, and the maximum execution time per call.
2. Sync invocation (small jobs)
Short-running models return the output inline.
curl -X POST https://api.hypereal.cloud/v1/gpu/sdxl \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{"input": {"prompt": "a tabby cat astronaut"}}'
→ { "id": "...",
"status": "succeeded",
"outputs": ["https://cdn.hypereal.cloud/gpu/.../out.png"],
"costCredits": 50,
"durationMs": 4210 }3. Async invocation (long jobs)
Long-running models queue and return a job id immediately with a 202. Poll, or wait for our cron + webhook poller to settle the job.
# Submit
POST /v1/gpu/wan-video
{ "input": { "prompt": "drone over Tokyo, neon, rain", "seconds": 5 } }
→ 202 { "id": "abc...", "status": "queued", "pollUrl": "/v1/gpu/jobs/abc..." }
# Poll
GET /v1/gpu/jobs/abc...
→ { "id": "abc...",
"status": "succeeded",
"outputs": ["https://cdn.hypereal.cloud/gpu/.../clip.mp4"],
"costCredits": 312,
"durationMs": 156000 }Failed and timed-out jobs auto-refund the credit reservation. Per-second billing reconciles on completion using the model's reported execution time, capped at the model'smaxSeconds.
ENTERPRISE
Teams, RBAC & SSO
Organizations, five built-in roles, SAML and OIDC single sign-on. Built so security and procurement can sign off without a custom rider.
Org-scoped keys, audit log, billing
Every API key, webhook, ComfyUI workflow, and GPU template can belong to an organization instead of an individual. Teammates share one budget, one audit trail, and one invoice. Personal keys keep working alongside.
POST /api/orgs
{
"name": "Acme Inc"
}
→ { id, slug, role: "owner" }Owner · Admin · Developer · Billing · Viewer
- Owner — everything, including delete-org
- Admin — manage members, keys, SSO, webhooks
- Developer — create/delete API keys, manage workflows + GPUs
- Billing — view + manage payments and audit log
- Viewer — read-only access to keys, billing, audit
Configure your IdP in 3 steps
- Create a SAML app in Okta / Azure AD / Auth0 / Google.
- Set ACS URL to
https://hypereal.cloud/api/auth/sso/<providerId> - Paste the IdP metadata XML into /settings/organization → SSO.
Set the email-domain claim (e.g. acme.com) and the login form will auto-route corporate emails to your IdP — no password prompt.
Issuer + client credentials
Drop in your issuer URL, client id, and client secret. We fetch the/.well-known/openid-configuration on save and surface a green check when the IdP is reachable.
POST /api/orgs/{id}/sso
{
"type": "oidc",
"issuer": "https://idp.acme.com",
"clientId": "...",
"clientSecret": "...",
"domain": "acme.com"
}12
요금 및 크레딧
단일 단위: 100 크레딧 = $1.00 USD. LLM은 각 모델의 입력 / 출력 단가를 사용하여 토큰당 과금됩니다. 미디어 모델은 이미지당, 초당, 또는 클립당 과금됩니다.
LLMs
토큰 × MTok당 단가입니다. 스트리밍 요청은 마지막 usage 청크 기준으로 과금됩니다.
이미지
생성당 정액 × 실제 반환된 n 반환됩니다.
비디오 및 오디오
초당(대부분의 비디오), 클립당(Veo, Vidu, Grok), 또는 요청당(Fish Audio)으로 과금됩니다.
Claude, GPT, Gemini 및 일부 이미지 모델 (GPT Image 2, Nano Banana)은 직판 공급자보다 낮은 가격으로 제공됩니다. 비디오, 오디오, 기타 미디어 모델은 표준 요율로 과금됩니다.

