LogoHypereal AI
ModelsCoding LLMLimited
Products
  • AI Image GeneratorCreate images with AI
  • AI Video GeneratorCreate videos with AI
  • AI Avatar GeneratorTalking avatars & lip sync
  • AI Audio GeneratorVoices, music & speech
  • AI ToolsUpscale, swap, edit & more
  • AppsOne-click creative apps
Infrastructure
  • GPU CloudOn-demand GPU compute
  • Rent GPUBare-metal GPU rental
  • Train ModelsFine-tune & LoRA training
  • ComfyUI as APIDeploy ComfyUI workflows
  • Deploy Any ModelServerless model hosting
Developers
  • DocsAPI reference & guides
  • Hypereal SDKRun any model from code
  • Enterprise APIProduction-grade gateway
  • Stable Diffusion APIOpen-source checkpoints
  • CookbookRecipes & code examples
Company
  • EnterpriseTalk to our team
  • BlogProduct & eng updates
  • ChangelogLatest releases
  • InspirationGallery & showcases
  • Be a CreatorJoin the creator program
  • AffiliatePartner program
  • AboutOur mission & team
AgentPricingDocsEnterpriseAffiliate
Start Building
Hypereal AI
  • Models
  • Coding LLM
  • Products
  • GPU Cloud
  • Rent GPU
  • Train Models
  • ComfyUI as API
  • Deploy Any Model
  • Stable Diffusion API
  • Hypereal SDK
  • Agent
  • Pricing
  • Docs
  • Enterprise
  • Affiliate
Infrastructure

GPU Clusters and Infrastructure

Create and operate high-performance GPU clusters, serverless endpoints, dedicated pods, storage, and training jobs with one API key and one bill. Mercury is the cluster control plane that turns many GPU cards into one scheduled compute resource.

Core surfaces

  • ·Mercury — the GPU cluster control plane. It plans topology, placement groups, gang scheduling policy, RDMA/NCCL hints, storage mounts, and capacity guardrails. POST /v1/gpu/mercury/plan
  • ·Deployments — provide a Docker image, get a managed auto-scaling GPU endpoint. Pay per second of execution. POST /v1/deployments
  • ·GPU Pods — dedicated hourly instances (H100, A100, L40S, A6000). SSH, public IP, persistent disk. Pricing is upstream GPU cost plus a 15% Hypereal margin. POST /v1/gpu/pods
  • ·GPU Clusters — self-serve multi-node GPU capacity workflow for distributed training and large-scale inference. Quote topology, record the request, and inspect Mercury runtime hints from dashboard or API. POST /v1/gpu/clusters/quote POST /v1/gpu/clusters
  • ·Training — one-click single-node pretraining pods and LoRA post-training jobs with owned datasets, R2 outputs, cancel/refund paths, 15% margin pricing, and a machine-readable capability map. GET /v1/training/capabilities POST /v1/training/pretrain POST /v1/training/jobs
  • ·Storage — S3-compatible object store for model weights, LoRAs, datasets, generated outputs, plus network volumes that can be attached to GPU Pods and Serverless workers. POST /v1/storage/upload POST /v1/gpu/volumes
  • ·Jobs — every invocation logged with latency, output, credit cost. Async webhook delivery or sync runs.

Authentication

All Infrastructure endpoints accept either a session cookie (dashboard) or a Bearer API key for SDK / CLI use.

curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/gpu/pods/types

Generate keys at /manage-api-keys.

Mercury

Mercury is the layer above raw GPU rentals. It models the high-performance network, allocates placement groups, gang-schedules multi-node jobs, and emits runtime configuration so thousands of GPU cards can behave like one coherent GPU cluster.

curl -X POST https://hypereal.cloud/v1/gpu/mercury/plan \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "gpuCount": 1024,
    "gpuSku": "h100-sxm-80gb",
    "network": "nvlink-ib400",
    "workload": "training",
    "modelParamsB": 405,
    "sequenceLength": 32768
  }'
Topology
Nodes, racks, network islands, rails, memory, bisection and collective bandwidth.
Scheduler
Placement groups, admission control, checkpoint-aware preemption, and gang scheduling.
Runtime
NCCL environment, torchrun rendezvous, storage mount, and topology-aware data plane hints.
{
  "object": "gpu.mercury.plan",
  "plan": {
    "topology": {
      "gpuCount": 1024,
      "nodes": 128,
      "racks": 8,
      "islands": 1,
      "railCount": 1024,
      "bisectionGbps": 409600
    },
    "placement": {
      "tensorParallel": 8,
      "pipelineParallel": 16,
      "dataParallel": 8
    },
    "runtime": {
      "nccl": {
        "NCCL_IB_DISABLE": "0",
        "NCCL_ALGO": "Ring,Tree,NVLS"
      }
    }
  }
}

GPU Clusters

The GPU Cluster dashboard is the visual control surface for Mercury. Use it when you want multiple GPUs across multiple nodes to behave like one planned cluster instead of a single pod. Today this is a quote and capacity workflow; one-click physical multi-node provisioning is not exposed until provider reconciliation is wired.

1. Open Clusters
Go to Infrastructure → GPU Clusters, or click Request capacity from the cluster workspace.
2. Configure
Choose GPU type, count, network, workload, orchestrator, region, storage, and budget guardrail.
3. Request and inspect
The cluster resource shows topology, node inventory, scheduler, runtime hints, quote, and capacity events.
Dashboard:
  /infra              GPU Cluster workspace
  /infra/clusters     List and manage clusters
  /infra/clusters/new Request GPU cluster capacity

API:
  POST   /v1/gpu/clusters/quote
  POST   /v1/gpu/clusters
  GET    /v1/gpu/clusters
  GET    /v1/gpu/clusters/:id
  DELETE /v1/gpu/clusters/:id

Training

Training has two production-ready one-click paths today: single-node pretraining pods and managed LoRA post-training jobs. Multi-node pretraining stays on the GPU Cluster capacity workflow until physical cluster provisioning is wired end to end.

For the standalone LoRA training API overview, pricing, and curl examples, open /lora-training-api.

Single-node pretraining
Launch a real GPU pod with SSH, TensorBoard, API, notebook ports, optional dataset URL, and hourly pod billing.
LoRA post-training
Train Flux, Qwen Image, and Wan 2.2 LoRAs on owned datasets; poll, cancel, refund, and download private outputs. Pricing is base trainer cost plus a 15% Hypereal margin.
Capability map
Read the machine-readable contract before building UI, automation, or SDK wrappers around the training surface.

Beginner onboarding

If you are new, use the dashboard first and move to the API after one successful run.

  1. 1. Pick a path. Use LoRA when you want downloadable .safetensors weights for a style, character, product, or motion adapter. Use a pretraining pod when you need SSH, TensorBoard, notebook access, custom Docker, or your own training scripts.
  2. 2. Prepare data. For LoRA, upload one zip in /infra/storage and set kind=dataset. For pretraining, attach a dataset at launch or copy data in after the pod starts.
  3. 3. Launch. Open /infra/training. Click Start training for LoRA, or Deploy pretraining pod for a GPU machine.
  4. 4. Finish safely. Download the .safetensors output after LoRA completion. For pretraining pods, save checkpoints and stop or terminate the pod when training is done so hourly billing stops.

Multi-node pretraining clusters remain a capacity workflow: they create quote, topology, and reservation records, but are not a one-click physical cluster launch today.

TrainerUse casePriceCredits held
flux-dev-loraPhotoreal style, character, product$1.15115
qwen-image-loraIllustration and character LoRA$1.15115
wan-2.2-image-loraHigh-resolution image LoRA$3.45345
wan-2.2-i2v-loraImage-to-video motion LoRA$5.75575
# Capability map
curl https://hypereal.cloud/v1/training/capabilities

# One-click single-node pretraining pod
curl -X POST https://hypereal.cloud/v1/training/pretrain \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "continued-pretrain-01",
    "gpuTypeId": "NVIDIA H100 80GB HBM3",
    "gpuCount": 1,
    "dockerImage": "nvcr.io/nvidia/pytorch:24.10-py3",
    "scenario": "continued-pretraining",
    "framework": "pytorch-fsdp",
    "precision": "bf16",
    "datasetObjectId": "sto_..."
  }'

# One-click LoRA post-training
curl -X POST https://hypereal.cloud/v1/training/jobs \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "brand-character-lora",
    "baseModel": "flux-dev-lora",
    "datasetObjectId": "sto_...",
    "hyperparams": {
      "triggerWord": "ohwx",
      "steps": 1000,
      "learningRate": 0.0004,
      "loraRank": 16
    }
  }'

# The response returns a local job id and the credits held.
→ {
  "id": "job_...",
  "baseModel": "flux-dev-lora",
  "status": "queued",
  "creditsCharged": 115
}

# List, poll, and cancel
curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/training/jobs
# GET /v1/training/jobs
curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/training/jobs/job_...
# GET /v1/training/jobs/{id}
curl -X DELETE -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/training/jobs/job_...

Pricing

Custom deployments (per-second)

Charged by GPU-second while a worker is actively executing your handler. Idle workers (kept warm to reduce cold-start) are not billed beyond your idleTimeout setting.

GPU tierUSD / secCredits / secUSD / hr
A4000 / A5000 (budget)$0.00010.01$0.36
A6000 / 6000 Ada (mid)$0.00030.03$1.08
A100 80GB$0.00080.08$2.88
H100 80GB$0.00140.14$5.04

100 credits = $1 USD. Pricing uses managed GPU capacity and may change as upstream supply changes. Custom commit pricing available above $5k/month — contact us.

Dedicated GPU pods (hourly)

Live pricing — current rates are shown on /infra/pods/new. The sell price is the live upstream GPU hourly rate multiplied by 1.15. First hour is pre-charged; auto-bills hourly while running. Terminate to stop the meter.

GPU clusters

Clusters are quote-first capacity resources. Requests stay visible as cluster resources with capacity state, topology, scheduler plan, node inventory, and event log while placement is coordinated.

curl -X POST https://hypereal.cloud/v1/gpu/clusters/quote \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "training-mercury-01",
    "gpuSku": "h100-sxm-80gb",
    "gpuCount": 64,
    "network": "nvlink-ib400",
    "workload": "training",
    "orchestrator": "slurm",
    "storageGb": 4096,
    "modelParamsB": 405,
    "sequenceLength": 32768
  }'
  • POST /v1/gpu/clusters/quote returns the exact topology, placement, scheduler policy, NCCL variables, torchrun command, launch runbook, readiness checks, and price before any cluster resource is created.
curl -X POST https://hypereal.cloud/v1/gpu/clusters \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "training-mercury-01",
    "gpuSku": "h100-sxm-80gb",
    "gpuCount": 8,
    "network": "nvlink-ib400",
    "workload": "training",
    "orchestrator": "slurm",
    "storageGb": 1024
  }'
  • GET /v1/gpu/clusters lists cluster resources.
  • GET /v1/gpu/clusters/:id returns topology, nodes, scheduler, runtime, quote, and events.
  • DELETE /v1/gpu/clusters/:id cancels or terminates the cluster request and marks nodes terminated.

Storage

  • Egress within Hypereal: free (deployments → storage → jobs)
  • Egress to public internet: $0.02 / GB above 10 GB free per month
  • At rest: $0.015 / GB / month, no minimums

Network volumes

Network volumes are persistent GPU-side storage. They mount at /workspace on Pods and /runpod-volume in Serverless workers. Create one on the Storage page, then attach it when creating a Pod or Deployment.

curl -X POST https://hypereal.cloud/v1/gpu/volumes \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "models-us-wa-1",
    "sizeGb": 100,
    "dataCenterId": "US-WA-1"
  }'
  • GET /v1/gpu/volumes lists network volumes.
  • PATCH /v1/gpu/volumes/:id renames or expands a volume. Volumes cannot be shrunk.
  • DELETE /v1/gpu/volumes/:id deletes the upstream persistent volume.

Handler spec

Your Docker image must expose an HTTP handler on the port declared in ports (default 8000). Requests arrive as:

POST /run HTTP/1.1
Content-Type: application/json

{ "input": { ...your payload... } }

Respond synchronously with JSON:

{ "output": { ...your result... } }

Minimal echo handler (Python)

10-line worker that returns whatever you sent. Useful for verifying your deployment connects end-to-end before swapping in a real model.

# handler.py
import runpod

def handler(event):
    return {"echo": event.get("input", {})}

runpod.serverless.start({"handler": handler})
# Dockerfile
FROM python:3.11-slim
RUN pip install --no-cache-dir runpod
COPY handler.py /handler.py
CMD ["python", "-u", "/handler.py"]

Build, push to any container registry (GHCR / Docker Hub / GCR / ECR), then point a new deployment at the image:

docker build -t ghcr.io/you/hypereal-echo:v1 .
docker push ghcr.io/you/hypereal-echo:v1

curl -X POST https://hypereal.cloud/v1/deployments \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "echo",
    "name": "Echo worker",
    "dockerImage": "ghcr.io/you/hypereal-echo:v1",
    "gpuTypes": "AMPERE_16"
  }'

Anything more elaborate (vLLM, ComfyUI, Flux LoRA) follows the same shape — replace the handler body with your model code. Runpod-compatible handlers are supported, so existing serverless workers can be moved over with minimal changes.

Quick start

1. Provision a deployment

curl -X POST https://hypereal.cloud/v1/deployments \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "my-flux-lora",
    "name": "My Flux LoRA",
    "dockerImage": "ghcr.io/me/flux-lora-handler:latest",
    "gpuTypes": "ADA_48_PRO,AMPERE_80",
    "workersMin": 0,
    "workersMax": 3,
    "idleTimeoutSeconds": 5
  }'

# gpuTypes accepts one or more comma-separated GPU pool IDs:
#   AMPERE_16, AMPERE_24, ADA_24, AMPERE_48, ADA_48_PRO,
#   AMPERE_80, ADA_80_PRO, HOPPER_141, ADA_32_PRO,
#   BLACKWELL_96, BLACKWELL_180

2. Run a job (async, with webhook)

curl -X POST https://hypereal.cloud/v1/gpu/run/my-flux-lora \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"input": {"prompt": "a cat astronaut"}}'

# => { "job_id": "abc...", "status": "queued" }

3. Poll, or wait for the webhook

curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/gpu/jobs/abc...

# => { "status": "succeeded", "output": {...}, "executionMs": 1840 }

4. Rent a dedicated GPU pod

curl https://hypereal.cloud/v1/gpu/pods/types \
  -H "Authorization: Bearer ck_..."
# pick a "id" from the response, then:

curl -X POST https://hypereal.cloud/v1/gpu/pods \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "training-rig-01",
    "gpuTypeId": "NVIDIA H100 80GB HBM3",
    "dockerImage": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04",
    "containerDiskGb": 80
  }'

5. Deploy training

# Single-node pretraining pod
curl -X POST https://hypereal.cloud/v1/training/pretrain \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"name":"continued-pretrain-01","gpuTypeId":"NVIDIA H100 80GB HBM3","dockerImage":"nvcr.io/nvidia/pytorch:24.10-py3"}'

# LoRA post-training
curl -X POST https://hypereal.cloud/v1/training/jobs \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"name":"brand-lora","baseModel":"flux-dev-lora","datasetObjectId":"sto_..."}'

6. Request GPU cluster capacity

mailto:sales[at]hypereal.cloud

Subject: GPU Cluster Capacity

Include GPU type/count, preferred region, expected runtime,
networking needs, and whether you need Slurm, Kubernetes,
or a private endpoint.

(replace [at] with @ — obfuscated to slow down email scrapers.)

7. Upload model weights

# Step A: get a signed PUT URL
curl -X POST https://hypereal.cloud/v1/storage/upload \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"filename": "lora.safetensors", "contentType": "application/octet-stream", "kind": "lora"}'

# => { "id": "...", "uploadUrl": "https://...", "publicUrl": "..." }

# Step B: PUT directly to the signed URL
curl -X PUT "<uploadUrl>" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @lora.safetensors

# Step C: confirm
curl -X POST https://hypereal.cloud/v1/storage/commit \
  -H "Authorization: Bearer ck_..." \
  -d '{"id": "..."}'

Webhooks

When you submit a job in async mode, Hypereal calls your webhook_url (if provided in the request body) on terminal status. The webhook delivers:

POST <your_webhook_url>?secret=<hypereal_signed_secret>
Content-Type: application/json

{
  "job_id": "abc...",
  "status": "succeeded",
  "output": { ... },
  "executionMs": 1840
}
Need 100+ GPUs, custom regions, SLA, SAML?
We do enterprise commits with dedicated capacity, private network volumes, and committed-use discounts of 30–60%. Email sales[at]hypereal.cloud and we'll set it up.
LogoHypereal AI
All systems normal
Infrastructure
  • Rent GPU
  • Train Models
  • ComfyUI as API
  • Deploy Any Model
  • GPU Cloud
  • LoRA Training API
  • Explore Catalog
  • Infrastructure Docs
  • GPU Logs
  • Pricing
LLM API
  • Hypereal SDK
  • Enterprise API
  • Coding Credits
  • All LLM Models
  • Claude Opus 4.7
  • Claude Sonnet 4.6
  • GPT-5.5
  • Claude Haiku 4.5
  • GPT-5.5 Pro
  • GPT-5.3 Codex
  • Gemini 3.1 Pro Preview
  • Gemini 3.5 Thinking
  • Gemini 3.5 Fast
  • DeepSeek V4 Pro
  • Kimi K2.6
  • GLM-5.1
  • Claude Code Alternative
  • Claude API in China
  • OpenAI API in China
AI API
  • AI API Overview
  • Seedance 2.0 API
  • Kling 3.0 API
  • Veo 3.1 API
  • FLUX API
  • GPT Image 2 API
  • vs WaveSpeed
  • vs fal.ai
  • vs Replicate
  • vs KIE.ai
  • vs OpenRouter
  • vs Together AI
  • vs SiliconFlow
  • Midjourney Alternative
  • Higgsfield Alternative
  • OpenRouter Alternative
Video Models
  • Google Veo 3.1 API
  • Kling 3.0 API
  • Kling O3 Pro API
  • Seedance 2.0 API
  • HappyHorse 1.0 API
  • WAN 2.7 API
  • WAN Video API
  • Grok Video API
  • Hunyuan Video API
  • PixVerse V6 API
  • Pika Video API
  • Luma Dream Machine API
  • MiniMax Video API
  • Vidu Video API
  • Gemini Omni Video API
Image Models
  • NanoBanana 2 API
  • FLUX 2 API
  • GPT Image 1 API
  • Grok Image API
  • SeeDream V5 API
  • Imagen 4 API
  • Ideogram API
  • Recraft API
  • DALL-E 3 API
  • Stable Diffusion API
  • Gemini Image API
Tools
  • Face Swap API
  • Video Face Swap API
  • Virtual Try-On API
  • Image Upscaler API
  • Video Upscaler API
  • AI Talking Avatar API
  • Lip Sync API
  • OmniHuman Avatar API
  • Tripo3D H3.1 API
  • ElevenLabs TTS API
  • Fish Audio TTS API
  • Whisper STT API
  • Lyria Music API
Generators
  • Hypereal Agent
  • Apps
  • AI Image Generator
  • AI Video Generator
  • AI Avatar Generator
  • AI Audio Generator
  • AI 3D Generator
  • AI Tools
  • Image Upscaler
  • Video Upscaler
Collections
  • Best Video Models
  • Best Image Models
  • Seedance 2.0
  • WAN 2.7
  • Qwen Image 2
  • Grok AI
  • Seedance 1.5
  • Motion Control
  • Content Detection
  • Object Detection
Company
  • About
  • Docs
  • Hypereal SDK
  • Cookbook
  • Blog
  • Articles
  • Changelog
  • Contact
  • FAQ
  • Tips & Tutorials
  • Roadmap
  • Enterprise
  • Affiliate Program
  • Platform
  • Inspiration
  • Be a Creator
  • Developer Program
Legal
  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Cookie Policy
  • Pricing
  • All Models
  • Sitemap
  • Status
© Copyright 2026. All Rights Reserved.
TwitterGitHubLinkedInYouTubeEmail