GPU Clusters and Infrastructure
Create and operate high-performance GPU clusters, serverless endpoints, dedicated pods, storage, and training jobs with one API key and one bill. Mercury is the cluster control plane that turns many GPU cards into one scheduled compute resource.
Core surfaces
- ·Mercury — the GPU cluster control plane. It plans topology, placement groups, gang scheduling policy, RDMA/NCCL hints, storage mounts, and capacity guardrails.
POST /v1/gpu/mercury/plan - ·Deployments — provide a Docker image, get a managed auto-scaling GPU endpoint. Pay per second of execution.
POST /v1/deployments - ·GPU Pods — dedicated hourly instances (H100, A100, L40S, A6000). SSH, public IP, persistent disk. Pricing is upstream GPU cost plus a 15% Hypereal margin.
POST /v1/gpu/pods - ·GPU Clusters — self-serve multi-node GPU capacity workflow for distributed training and large-scale inference. Quote topology, record the request, and inspect Mercury runtime hints from dashboard or API.
POST /v1/gpu/clusters/quotePOST /v1/gpu/clusters - ·Training — one-click single-node pretraining pods and LoRA post-training jobs with owned datasets, R2 outputs, cancel/refund paths, and a machine-readable capability map.
GET /v1/training/capabilitiesPOST /v1/training/pretrainPOST /v1/training/jobs - ·Storage — S3-compatible object store for model weights, LoRAs, datasets, generated outputs, plus network volumes that can be attached to GPU Pods and Serverless workers.
POST /v1/storage/uploadPOST /v1/gpu/volumes - ·Jobs — every invocation logged with latency, output, credit cost. Async webhook delivery or sync runs.
Authentication
All Infrastructure endpoints accept either a session cookie (dashboard) or a Bearer API key for SDK / CLI use.
curl -H "Authorization: Bearer ck_..." \ https://hypereal.cloud/v1/gpu/pods/types
Generate keys at /manage-api-keys.
Mercury
Mercury is the layer above raw GPU rentals. It models the high-performance network, allocates placement groups, gang-schedules multi-node jobs, and emits runtime configuration so thousands of GPU cards can behave like one coherent GPU cluster.
curl -X POST https://hypereal.cloud/v1/gpu/mercury/plan \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"gpuCount": 1024,
"gpuSku": "h100-sxm-80gb",
"network": "nvlink-ib400",
"workload": "training",
"modelParamsB": 405,
"sequenceLength": 32768
}'{
"object": "gpu.mercury.plan",
"plan": {
"topology": {
"gpuCount": 1024,
"nodes": 128,
"racks": 8,
"islands": 1,
"railCount": 1024,
"bisectionGbps": 409600
},
"placement": {
"tensorParallel": 8,
"pipelineParallel": 16,
"dataParallel": 8
},
"runtime": {
"nccl": {
"NCCL_IB_DISABLE": "0",
"NCCL_ALGO": "Ring,Tree,NVLS"
}
}
}
}GPU Clusters
The GPU Cluster dashboard is the visual control surface for Mercury. Use it when you want multiple GPUs across multiple nodes to behave like one planned cluster instead of a single pod. Today this is a quote and capacity workflow; one-click physical multi-node provisioning is not exposed until provider reconciliation is wired.
Dashboard: /infra GPU Cluster workspace /infra/clusters List and manage clusters /infra/clusters/new Request GPU cluster capacity API: POST /v1/gpu/clusters/quote POST /v1/gpu/clusters GET /v1/gpu/clusters GET /v1/gpu/clusters/:id DELETE /v1/gpu/clusters/:id
Training
Training has two production-ready one-click paths today: single-node pretraining pods and managed LoRA post-training jobs. Multi-node pretraining stays on the GPU Cluster capacity workflow until physical cluster provisioning is wired end to end.
Beginner onboarding
If you are new, use the dashboard first and move to the API after one successful run.
- 1. Pick a path. Use LoRA when you want downloadable .safetensors weights for a style, character, product, or motion adapter. Use a pretraining pod when you need SSH, TensorBoard, notebook access, custom Docker, or your own training scripts.
- 2. Prepare data. For LoRA, upload one zip in
/infra/storageand set kind=dataset. For pretraining, attach a dataset at launch or copy data in after the pod starts. - 3. Launch. Open
/infra/training. Click Start training for LoRA, or Deploy pretraining pod for a GPU machine. - 4. Finish safely. Download the .safetensors output after LoRA completion. For pretraining pods, save checkpoints and stop or terminate the pod when training is done so hourly billing stops.
Multi-node pretraining clusters remain a capacity workflow: they create quote, topology, and reservation records, but are not a one-click physical cluster launch today.
# Capability map
curl https://hypereal.cloud/v1/training/capabilities
# One-click single-node pretraining pod
curl -X POST https://hypereal.cloud/v1/training/pretrain \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"name": "continued-pretrain-01",
"gpuTypeId": "NVIDIA H100 80GB HBM3",
"gpuCount": 1,
"dockerImage": "nvcr.io/nvidia/pytorch:24.10-py3",
"scenario": "continued-pretraining",
"framework": "pytorch-fsdp",
"precision": "bf16",
"datasetObjectId": "sto_..."
}'
# One-click LoRA post-training
curl -X POST https://hypereal.cloud/v1/training/jobs \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"name": "brand-character-lora",
"baseModel": "flux-dev-lora",
"datasetObjectId": "sto_...",
"hyperparams": {
"triggerWord": "ohwx",
"steps": 1000,
"learningRate": 0.0004,
"loraRank": 16
}
}'
# List, poll, and cancel
curl -H "Authorization: Bearer ck_..." \
https://hypereal.cloud/v1/training/jobs
# GET /v1/training/jobs
curl -H "Authorization: Bearer ck_..." \
https://hypereal.cloud/v1/training/jobs/job_...
# GET /v1/training/jobs/{id}
curl -X DELETE -H "Authorization: Bearer ck_..." \
https://hypereal.cloud/v1/training/jobs/job_...Pricing
Custom deployments (per-second)
Charged by GPU-second while a worker is actively executing your handler. Idle workers (kept warm to reduce cold-start) are not billed beyond your idleTimeout setting.
| GPU tier | USD / sec | Credits / sec | USD / hr |
|---|---|---|---|
| A4000 / A5000 (budget) | $0.0001 | 0.01 | $0.36 |
| A6000 / 6000 Ada (mid) | $0.0003 | 0.03 | $1.08 |
| A100 80GB | $0.0008 | 0.08 | $2.88 |
| H100 80GB | $0.0014 | 0.14 | $5.04 |
100 credits = $1 USD. Pricing uses managed GPU capacity and may change as upstream supply changes. Custom commit pricing available above $5k/month — contact us.
Dedicated GPU pods (hourly)
Live pricing — current rates are shown on /infra/pods/new. The sell price is the live upstream GPU hourly rate multiplied by 1.15. First hour is pre-charged; auto-bills hourly while running. Terminate to stop the meter.
GPU clusters
Clusters are quote-first capacity resources. Requests stay visible as cluster resources with capacity state, topology, scheduler plan, node inventory, and event log while placement is coordinated.
curl -X POST https://hypereal.cloud/v1/gpu/clusters/quote \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"name": "training-mercury-01",
"gpuSku": "h100-sxm-80gb",
"gpuCount": 64,
"network": "nvlink-ib400",
"workload": "training",
"orchestrator": "slurm",
"storageGb": 4096,
"modelParamsB": 405,
"sequenceLength": 32768
}'POST /v1/gpu/clusters/quotereturns the exact topology, placement, scheduler policy, NCCL variables, torchrun command, launch runbook, readiness checks, and price before any cluster resource is created.
curl -X POST https://hypereal.cloud/v1/gpu/clusters \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"name": "training-mercury-01",
"gpuSku": "h100-sxm-80gb",
"gpuCount": 8,
"network": "nvlink-ib400",
"workload": "training",
"orchestrator": "slurm",
"storageGb": 1024
}'GET /v1/gpu/clusterslists cluster resources.GET /v1/gpu/clusters/:idreturns topology, nodes, scheduler, runtime, quote, and events.DELETE /v1/gpu/clusters/:idcancels or terminates the cluster request and marks nodes terminated.
Storage
- Egress within Hypereal: free (deployments → storage → jobs)
- Egress to public internet: $0.02 / GB above 10 GB free per month
- At rest: $0.015 / GB / month, no minimums
Network volumes
Network volumes are persistent GPU-side storage. They mount at /workspace on Pods and /runpod-volume in Serverless workers. Create one on the Storage page, then attach it when creating a Pod or Deployment.
curl -X POST https://hypereal.cloud/v1/gpu/volumes \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"name": "models-us-wa-1",
"sizeGb": 100,
"dataCenterId": "US-WA-1"
}'GET /v1/gpu/volumeslists network volumes.PATCH /v1/gpu/volumes/:idrenames or expands a volume. Volumes cannot be shrunk.DELETE /v1/gpu/volumes/:iddeletes the upstream persistent volume.
Handler spec
Your Docker image must expose an HTTP handler on the port declared in ports (default 8000). Requests arrive as:
POST /run HTTP/1.1
Content-Type: application/json
{ "input": { ...your payload... } }Respond synchronously with JSON:
{ "output": { ...your result... } }Minimal echo handler (Python)
10-line worker that returns whatever you sent. Useful for verifying your deployment connects end-to-end before swapping in a real model.
# handler.py
import runpod
def handler(event):
return {"echo": event.get("input", {})}
runpod.serverless.start({"handler": handler})# Dockerfile FROM python:3.11-slim RUN pip install --no-cache-dir runpod COPY handler.py /handler.py CMD ["python", "-u", "/handler.py"]
Build, push to any container registry (GHCR / Docker Hub / GCR / ECR), then point a new deployment at the image:
docker build -t ghcr.io/you/hypereal-echo:v1 .
docker push ghcr.io/you/hypereal-echo:v1
curl -X POST https://hypereal.cloud/v1/deployments \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"slug": "echo",
"name": "Echo worker",
"dockerImage": "ghcr.io/you/hypereal-echo:v1",
"gpuTypes": "AMPERE_16"
}'Anything more elaborate (vLLM, ComfyUI, Flux LoRA) follows the same shape — replace the handler body with your model code. Runpod-compatible handlers are supported, so existing serverless workers can be moved over with minimal changes.
Quick start
1. Provision a deployment
curl -X POST https://hypereal.cloud/v1/deployments \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"slug": "my-flux-lora",
"name": "My Flux LoRA",
"dockerImage": "ghcr.io/me/flux-lora-handler:latest",
"gpuTypes": "ADA_48_PRO,AMPERE_80",
"workersMin": 0,
"workersMax": 3,
"idleTimeoutSeconds": 5
}'
# gpuTypes accepts one or more comma-separated GPU pool IDs:
# AMPERE_16, AMPERE_24, ADA_24, AMPERE_48, ADA_48_PRO,
# AMPERE_80, ADA_80_PRO, HOPPER_141, ADA_32_PRO,
# BLACKWELL_96, BLACKWELL_1802. Run a job (async, with webhook)
curl -X POST https://hypereal.cloud/v1/gpu/run/my-flux-lora \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{"input": {"prompt": "a cat astronaut"}}'
# => { "job_id": "abc...", "status": "queued" }3. Poll, or wait for the webhook
curl -H "Authorization: Bearer ck_..." \
https://hypereal.cloud/v1/gpu/jobs/abc...
# => { "status": "succeeded", "output": {...}, "executionMs": 1840 }4. Rent a dedicated GPU pod
curl https://hypereal.cloud/v1/gpu/pods/types \
-H "Authorization: Bearer ck_..."
# pick a "id" from the response, then:
curl -X POST https://hypereal.cloud/v1/gpu/pods \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"name": "training-rig-01",
"gpuTypeId": "NVIDIA H100 80GB HBM3",
"dockerImage": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04",
"containerDiskGb": 80
}'5. Deploy training
# Single-node pretraining pod
curl -X POST https://hypereal.cloud/v1/training/pretrain \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{"name":"continued-pretrain-01","gpuTypeId":"NVIDIA H100 80GB HBM3","dockerImage":"nvcr.io/nvidia/pytorch:24.10-py3"}'
# LoRA post-training
curl -X POST https://hypereal.cloud/v1/training/jobs \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{"name":"brand-lora","baseModel":"flux-dev-lora","datasetObjectId":"sto_..."}'6. Request GPU cluster capacity
mailto:sales[at]hypereal.cloud Subject: GPU Cluster Capacity Include GPU type/count, preferred region, expected runtime, networking needs, and whether you need Slurm, Kubernetes, or a private endpoint.
(replace [at] with @ — obfuscated to slow down email scrapers.)
7. Upload model weights
# Step A: get a signed PUT URL
curl -X POST https://hypereal.cloud/v1/storage/upload \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{"filename": "lora.safetensors", "contentType": "application/octet-stream", "kind": "lora"}'
# => { "id": "...", "uploadUrl": "https://...", "publicUrl": "..." }
# Step B: PUT directly to the signed URL
curl -X PUT "<uploadUrl>" \
-H "Content-Type: application/octet-stream" \
--data-binary @lora.safetensors
# Step C: confirm
curl -X POST https://hypereal.cloud/v1/storage/commit \
-H "Authorization: Bearer ck_..." \
-d '{"id": "..."}'Webhooks
When you submit a job in async mode, Hypereal calls your webhook_url (if provided in the request body) on terminal status. The webhook delivers:
POST <your_webhook_url>?secret=<hypereal_signed_secret>
Content-Type: application/json
{
"job_id": "abc...",
"status": "succeeded",
"output": { ... },
"executionMs": 1840
}