01 · Get started in 90s
Quickstart
Mint a key, point your client at hypereal.build, ship. Auth and request shapes are OpenAI-compatible — most SDKs work by changing only the base URL.
Top up at least $2 (200 credits) and create a key at /manage-api-keys. Keys start with ck_.
Base URL: https://hypereal.build/api/v1
Auth header is Authorization: Bearer ck_.... Same OpenAI request bodies you already know.
curl https://hypereal.build/api/v1/chat/completions \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Say hi in one word."}]
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.NEOCLOUD_API_KEY, // ck_...
baseURL: 'https://hypereal.build/api/v1',
});
const completion = await client.chat.completions.create({
model: 'gpt-5.5',
messages: [{ role: 'user', content: 'Say hi in one word.' }],
});
console.log(completion.choices[0].message.content);02
Authentication
Every request needs a ck_-prefixed key. Three accepted header forms cover all SDKs.
Bearer ck_... — used by the OpenAI SDK, Codex CLI, and Cursor.ck_... — used by the Anthropic SDK and Claude Code on /v1/messages.ck_... — Google Gemini SDK / native shape, accepted by /v1/gemini.?key=ck_... also works.03 · OpenAI-compatible
Chat Completions
The workhorse endpoint. OpenAI Chat Completions wire format. Used for GPT, Gemini, Qwen, DeepSeek, GLM, and every other non-Anthropic LLM.
/api/v1/chat/completionsRequest body
/v1/messages instead.role, content).false. SSE stream when true; usage is included in the final chunk.Pricing
Billed per token using each model's input/output rate. 100 credits = $1.00. The minimum balance to call the endpoint is 200 credits ($2.00).
curl https://hypereal.build/api/v1/chat/completions \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You are a terse assistant."},
{"role": "user", "content": "Two-line haiku about caches."}
],
"stream": true,
"max_tokens": 256
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.NEOCLOUD_API_KEY,
baseURL: 'https://hypereal.build/api/v1',
});
const stream = await client.chat.completions.create({
model: 'gpt-5.5',
stream: true,
messages: [{ role: 'user', content: 'Stream me a haiku.' }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}OpenAI & provider-compatible models
gpt-5gpt-5.1gpt-5.2gpt-5.3gpt-5.4gpt-5.5gpt-5.5-instantgpt-5.5-progpt-5.4-minigpt-5.4-nanogpt-5.4-officialgpt-5.4-pro-officialgpt-5.2-officialgpt-5-pro-officialgpt-realtime-1.5-officialgpt-audio-1.5-officialglm-5qwen3.5-plusqwen3.5-flashqwen3-maxdeepseek-v3.2kimi-k2.5MiniMax-M2.5nano-banana-204 · Anthropic-compatible
Messages
Anthropic /v1/messages wire format with extended thinking, multi-upstream failover, and 15-second SSE keepalives. Use this for Claude Code, OpenCode, OpenClaw, and the official Anthropic SDK.
/api/v1/messagesRequest body
claude-opus-4-6, claude-sonnet-4-6, or claude-haiku-4-5. Older Anthropic IDs (claude-sonnet-4-5-20250929, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022) auto-alias to the latest equivalents.budget_tokens caps the reasoning trace. The endpoint sends 15s SSE pings to keep proxies from closing long thinking streams.curl https://hypereal.build/api/v1/messages \
-H "x-api-key: ck_..." \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-6",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Plan a 3-step refactor of a Next.js app."}
],
"thinking": {"type": "enabled", "budget_tokens": 4000}
}'import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: process.env.NEOCLOUD_API_KEY, // ck_...
baseURL: 'https://hypereal.build/api/v1',
});
const msg = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello, Claude.' }],
});
console.log(msg.content);Anthropic models
claude-opus-4-6claude-sonnet-4-6claude-haiku-4-505 · OpenAI Responses API
Responses
OpenAI's newer Responses API (used by Codex CLI's `wire_api = responses` mode and the OpenAI Agents SDK). Same auth as chat/completions; the request body uses `input` instead of `messages`.
/api/v1/responsesNotes
- Anthropic models return a 400 — they belong on
/v1/messages. - Streaming and non-streaming both billed off
response.usage.input_tokens/output_tokens. - Some upstreams always emit SSE — the endpoint detects this and streams through transparently even if
stream:false. - Multi-upstream failover. Set a long client timeout (300s+).
curl https://hypereal.build/api/v1/responses \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.1-codex",
"input": "Write a TypeScript function that debounces a callback.",
"stream": true
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.NEOCLOUD_API_KEY,
baseURL: 'https://hypereal.build/api/v1',
});
const response = await client.responses.create({
model: 'gpt-5-codex',
input: 'Refactor this file into smaller modules.',
});
console.log(response.output_text);Codex-tuned models
gpt-5-codexgpt-5-codex-minigpt-5.1-codexgpt-5.1-codex-minigpt-5.1-codex-maxgpt-5.2-codexgpt-5.3-codexgpt-5.3-codex-sparkgpt-5.3-codex-official06 · Codex CLI / Codex Desktop
Codex CLI
Codex points its `wire_api = responses` provider at /api/v1/codex/responses. The CLI prepends `/responses` to the base URL, so configure the base URL as shown.
/api/v1/codex/responses# ~/.codex/config.toml model_provider = "hypereal" model = "gpt-5-codex" [model_providers.hypereal] name = "Hypereal" base_url = "https://hypereal.build/api/v1/codex" wire_api = "responses" env_key = "NEOCLOUD_API_KEY"
Then export your key:export NEOCLOUD_API_KEY=ck_...
Run codex as usual. Anything Codex sends — full reasoning streams, tool calls, file edits — proxies through unchanged. Billing keys off the standard input_tokens / output_tokens usage block.
Same setup works for OpenCode, Claude Code (use /v1/messages), Cursor (use /v1/chat/completions), and the Gemini CLI (use /v1/gemini).
07
Image generation
OpenAI-compatible /images/generations shape. Synchronous — the endpoint returns image URLs (or base64) when the upstream finishes. Billed per image; `n` is clamped to 1–10.
/api/v1/images/generationsRequest body
image, reference_images).1024x1024, 1536x1024. Provider-dependent.creditsPerGeneration × n, the endpoint returns 402.curl https://hypereal.build/api/v1/images/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "nano_banana_pro",
"prompt": "isometric studio shot of a tiny cyberpunk apartment, neon rim light",
"n": 1,
"size": "1024x1024"
}'const res = await fetch('https://hypereal.build/api/v1/images/generations', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.NEOCLOUD_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gemini-3-pro-image-preview',
prompt: 'a chrome teapot floating over the ocean at sunset',
n: 1,
}),
});
const { data } = await res.json();
console.log(data[0].url); // or data[0].b64_json depending on the modelImage models
gpt-image-2gpt-4o-imagenano_banananano_banana_2gemini-3.1-flash-image-previewgemini-2.5-flash-image-previewflux-kontext-proflux-2-prodoubao-seedream-4-0doubao-seedream-4-5doubao-seedream-5-0gemini-3.1-flash-image-preview-officialflux-kontext-maxgemini-2.5-flash-image-officialnano_banana_progemini-3-pro-image-previewflux-2-flexgemini-3-pro-image-preview-officialgemini-3-pro-image-preview-4Kgemini-3.1-fast-imagengemini-3.1-thinking-imagen08 · long-running
Video generation
Synchronous long-poll endpoint — keep the connection open until the clip is ready. Set your HTTP client timeout to 600s. Billing is per second (most models) or per clip (Veo, Vidu, Grok).
/api/v1/video/generationsRequest body
per_second models.16:9, 9:16, 1:1. Provider-dependent.last_image_url or image — see the upstream docs for that model.curl https://hypereal.build/api/v1/video/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2-0",
"prompt": "drone shot flying over a foggy forest at dawn, cinematic",
"duration": 5,
"aspect_ratio": "16:9",
"image_url": "https://example.com/keyframe.jpg"
}'const res = await fetch('https://hypereal.build/api/v1/video/generations', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.NEOCLOUD_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'kling-v3',
prompt: 'a cat walking on the moon',
duration: 5,
aspect_ratio: '16:9',
}),
});
// Long-running: connection stays open until the upstream returns the clip.
// Set a generous timeout (300+ seconds).
const data = await res.json();
console.log(data); // contains url(s) to the rendered mp4Video models
wan2.6-flashkling-2-6MiniMax-Hailuo-02doubao-seedance-1-0-pro-fastMiniMax-Hailuo-2.3wan2.6kling-video-o1kling-v3-omnikling-v3kling-v3-videodoubao-seedance-1-0-pro-qualitydoubao-seedance-2-0doubao-seedance-2-0-fastdoubao-seedance-1-5-proVeo3.1-fast-officialVeo3.1-quality-officialveo3.1-fastveo3.1-qualityvidu-q3-progrok-video-309 · Fish Audio
Audio — TTS, voice cloning, ASR
Three model IDs share one endpoint. The shape of the body and response depends on which one you call. Provider is Fish Audio (called direct, not via ToAPI), billed per request.
/api/v1/audio/generationsaudio-tts and audio-clone.audio-asr (input) and audio-clone (reference voice ≥ 10s).data: [{ url }] for TTS / clone, text (+ optional segments, duration) for ASR.curl https://hypereal.build/api/v1/audio/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "audio-tts",
"text": "Welcome to Hypereal. One key, every model.",
"voice_id": "en_male_calm"
}'curl https://hypereal.build/api/v1/audio/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "audio-clone",
"text": "This is my cloned voice.",
"audio": "https://example.com/reference-30s.mp3"
}'curl https://hypereal.build/api/v1/audio/generations \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "audio-asr",
"audio": "https://example.com/recording.mp3"
}'Audio models
audio-ttsaudio-cloneaudio-asr10 · Google native shape
Gemini
Accepts both Gemini-native (`contents` / `generationConfig` / `systemInstruction`) and OpenAI shapes on the same endpoint. The endpoint converts to OpenAI internally before forwarding. For most code, /v1/chat/completions with a Gemini model ID is simpler.
/api/v1/geminitemperature, maxOutputTokens, etc.contents.Auth header: x-goog-api-key: ck_..., ?key=ck_..., or Authorization: Bearer ck_... all work.
curl "https://hypereal.build/api/v1/gemini" \
-H "x-goog-api-key: ck_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.1-pro",
"contents": [
{"role": "user", "parts": [{"text": "Outline a launch plan."}]}
],
"generationConfig": {"temperature": 0.6, "maxOutputTokens": 2048}
}'// The /v1/gemini endpoint accepts both Gemini-native and OpenAI shapes.
// For SDK use, the OpenAI client + /v1/chat/completions is simpler.
const res = await fetch('https://hypereal.build/api/v1/gemini', {
method: 'POST',
headers: {
'x-goog-api-key': process.env.NEOCLOUD_API_KEY!,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gemini-3.1-fast',
contents: [{ role: 'user', parts: [{ text: 'Hi' }] }],
}),
});
console.log(await res.json());Gemini models
gemini-3-pro-officialgemini-3-pro-preview-officialgemini-3-flash-officialgemini-3-flash-preview-officialgemini-3.1-progemini-3.1-pro-preview-officialgemini-3.1-fastgemini-3.1-thinkinggemini-3.1-flash-lite-preview-officialgemini-2.5-pro-officialgemini-2.5-flash-officialgemini-2.5-flash-lite-officialgemini-2.0-flash-officialgemini-2.0-flash-lite-officialgemini-2.0-flash-vipgemini-2.5-flash-vipgemini-2.5-pro-vipgemini-3-flash-preview-vip11
Errors & rate limits
All errors are JSON of the form { error: { type, message } }. Rate limits are evaluated per user, not per key — multiple keys share the same quota.
ck_ prefix), expired, or inactive key.X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers are returned on rate limit responses.model, unknown model ID (the response includes available_models), or wrong endpoint for the format (e.g. an Anthropic model on /chat/completions).12
Pricing & credits
One unit: 100 credits = $1.00 USD. LLMs bill per token using each model's input / output rate. Media models bill per image, per second, or per clip.
LLMs
Tokens × per-MTok rate. Streaming requests are billed off the final usage chunk.
Images
Flat per generation × actual n returned.
Video & audio
Per second (most video), per clip (Veo, Vidu, Grok), or per request (Fish Audio).
Claude, GPT, Gemini, and select image models (GPT Image 2, Nano Banana) are priced under direct providers. Video, audio, and other media models are billed at standard rates.

