Infrastructure

GPU Clusters and Infrastructure

Create and operate high-performance GPU clusters, serverless endpoints, dedicated pods, storage, and training jobs with one API key and one bill. Mercury is the cluster control plane that turns many GPU cards into one scheduled compute resource.

Core surfaces

·Mercury — the GPU cluster control plane. It plans topology, placement groups, gang scheduling policy, RDMA/NCCL hints, storage mounts, and capacity guardrails. POST /v1/gpu/mercury/plan
·Deployments — provide a Docker image, get a managed auto-scaling GPU endpoint. Pay per second of execution. POST /v1/deployments
·GPU Pods — dedicated hourly instances (H100, A100, L40S, A6000). SSH, public IP, persistent disk. Pricing is upstream GPU cost plus a 15% Hypereal margin. POST /v1/gpu/pods
·GPU Clusters — self-serve multi-node GPU capacity workflow for distributed training and large-scale inference. Quote topology, record the request, and inspect Mercury runtime hints from dashboard or API. POST /v1/gpu/clusters/quote POST /v1/gpu/clusters
·Training — one-click single-node pretraining pods and LoRA post-training jobs with owned datasets, R2 outputs, cancel/refund paths, and a machine-readable capability map. GET /v1/training/capabilities POST /v1/training/pretrain POST /v1/training/jobs
·Storage — S3-compatible object store for model weights, LoRAs, datasets, generated outputs, plus network volumes that can be attached to GPU Pods and Serverless workers. POST /v1/storage/upload POST /v1/gpu/volumes
·Jobs — every invocation logged with latency, output, credit cost. Async webhook delivery or sync runs.

Authentication

All Infrastructure endpoints accept either a session cookie (dashboard) or a Bearer API key for SDK / CLI use.

curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/gpu/pods/types

Generate keys at /manage-api-keys.

Mercury

Mercury is the layer above raw GPU rentals. It models the high-performance network, allocates placement groups, gang-schedules multi-node jobs, and emits runtime configuration so thousands of GPU cards can behave like one coherent GPU cluster.

curl -X POST https://hypereal.cloud/v1/gpu/mercury/plan \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "gpuCount": 1024,
    "gpuSku": "h100-sxm-80gb",
    "network": "nvlink-ib400",
    "workload": "training",
    "modelParamsB": 405,
    "sequenceLength": 32768
  }'

Topology

Nodes, racks, network islands, rails, memory, bisection and collective bandwidth.

Scheduler

Placement groups, admission control, checkpoint-aware preemption, and gang scheduling.

Runtime

NCCL environment, torchrun rendezvous, storage mount, and topology-aware data plane hints.

{
  "object": "gpu.mercury.plan",
  "plan": {
    "topology": {
      "gpuCount": 1024,
      "nodes": 128,
      "racks": 8,
      "islands": 1,
      "railCount": 1024,
      "bisectionGbps": 409600
    },
    "placement": {
      "tensorParallel": 8,
      "pipelineParallel": 16,
      "dataParallel": 8
    },
    "runtime": {
      "nccl": {
        "NCCL_IB_DISABLE": "0",
        "NCCL_ALGO": "Ring,Tree,NVLS"
      }
    }
  }
}

GPU Clusters

The GPU Cluster dashboard is the visual control surface for Mercury. Use it when you want multiple GPUs across multiple nodes to behave like one planned cluster instead of a single pod. Today this is a quote and capacity workflow; one-click physical multi-node provisioning is not exposed until provider reconciliation is wired.

1. Open Clusters

Go to Infrastructure → GPU Clusters, or click Request capacity from the cluster workspace.

2. Configure

Choose GPU type, count, network, workload, orchestrator, region, storage, and budget guardrail.

3. Request and inspect

The cluster resource shows topology, node inventory, scheduler, runtime hints, quote, and capacity events.

Dashboard:
  /infra              GPU Cluster workspace
  /infra/clusters     List and manage clusters
  /infra/clusters/new Request GPU cluster capacity

API:
  POST   /v1/gpu/clusters/quote
  POST   /v1/gpu/clusters
  GET    /v1/gpu/clusters
  GET    /v1/gpu/clusters/:id
  DELETE /v1/gpu/clusters/:id

Training

Training has two production-ready one-click paths today: single-node pretraining pods and managed LoRA post-training jobs. Multi-node pretraining stays on the GPU Cluster capacity workflow until physical cluster provisioning is wired end to end.

Single-node pretraining

Launch a real GPU pod with SSH, TensorBoard, API, notebook ports, optional dataset URL, and hourly pod billing.

LoRA post-training

Train Flux, Qwen Image, and Wan 2.2 LoRAs on owned datasets; poll, cancel, refund, and download private outputs.

Capability map

Read the machine-readable contract before building UI, automation, or SDK wrappers around the training surface.

Beginner onboarding

If you are new, use the dashboard first and move to the API after one successful run.

1. Pick a path. Use LoRA when you want downloadable .safetensors weights for a style, character, product, or motion adapter. Use a pretraining pod when you need SSH, TensorBoard, notebook access, custom Docker, or your own training scripts.
2. Prepare data. For LoRA, upload one zip in /infra/storage and set kind=dataset. For pretraining, attach a dataset at launch or copy data in after the pod starts.
3. Launch. Open /infra/training. Click Start training for LoRA, or Deploy pretraining pod for a GPU machine.
4. Finish safely. Download the .safetensors output after LoRA completion. For pretraining pods, save checkpoints and stop or terminate the pod when training is done so hourly billing stops.

Multi-node pretraining clusters remain a capacity workflow: they create quote, topology, and reservation records, but are not a one-click physical cluster launch today.

# Capability map
curl https://hypereal.cloud/v1/training/capabilities

# One-click single-node pretraining pod
curl -X POST https://hypereal.cloud/v1/training/pretrain \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "continued-pretrain-01",
    "gpuTypeId": "NVIDIA H100 80GB HBM3",
    "gpuCount": 1,
    "dockerImage": "nvcr.io/nvidia/pytorch:24.10-py3",
    "scenario": "continued-pretraining",
    "framework": "pytorch-fsdp",
    "precision": "bf16",
    "datasetObjectId": "sto_..."
  }'

# One-click LoRA post-training
curl -X POST https://hypereal.cloud/v1/training/jobs \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "brand-character-lora",
    "baseModel": "flux-dev-lora",
    "datasetObjectId": "sto_...",
    "hyperparams": {
      "triggerWord": "ohwx",
      "steps": 1000,
      "learningRate": 0.0004,
      "loraRank": 16
    }
  }'

# List, poll, and cancel
curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/training/jobs
# GET /v1/training/jobs
curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/training/jobs/job_...
# GET /v1/training/jobs/{id}
curl -X DELETE -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/training/jobs/job_...

Pricing

Custom deployments (per-second)

Charged by GPU-second while a worker is actively executing your handler. Idle workers (kept warm to reduce cold-start) are not billed beyond your idleTimeout setting.

GPU tier	USD / sec	Credits / sec	USD / hr
A4000 / A5000 (budget)	$0.0001	0.01	$0.36
A6000 / 6000 Ada (mid)	$0.0003	0.03	$1.08
A100 80GB	$0.0008	0.08	$2.88
H100 80GB	$0.0014	0.14	$5.04

100 credits = $1 USD. Pricing uses managed GPU capacity and may change as upstream supply changes. Custom commit pricing available above $5k/month — contact us.

Dedicated GPU pods (hourly)

Live pricing — current rates are shown on /infra/pods/new. The sell price is the live upstream GPU hourly rate multiplied by 1.15. First hour is pre-charged; auto-bills hourly while running. Terminate to stop the meter.

GPU clusters

Clusters are quote-first capacity resources. Requests stay visible as cluster resources with capacity state, topology, scheduler plan, node inventory, and event log while placement is coordinated.

curl -X POST https://hypereal.cloud/v1/gpu/clusters/quote \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "training-mercury-01",
    "gpuSku": "h100-sxm-80gb",
    "gpuCount": 64,
    "network": "nvlink-ib400",
    "workload": "training",
    "orchestrator": "slurm",
    "storageGb": 4096,
    "modelParamsB": 405,
    "sequenceLength": 32768
  }'

POST /v1/gpu/clusters/quote returns the exact topology, placement, scheduler policy, NCCL variables, torchrun command, launch runbook, readiness checks, and price before any cluster resource is created.

curl -X POST https://hypereal.cloud/v1/gpu/clusters \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "training-mercury-01",
    "gpuSku": "h100-sxm-80gb",
    "gpuCount": 8,
    "network": "nvlink-ib400",
    "workload": "training",
    "orchestrator": "slurm",
    "storageGb": 1024
  }'

GET /v1/gpu/clusters lists cluster resources.
GET /v1/gpu/clusters/:id returns topology, nodes, scheduler, runtime, quote, and events.
DELETE /v1/gpu/clusters/:id cancels or terminates the cluster request and marks nodes terminated.

Storage

Egress within Hypereal: free (deployments → storage → jobs)
Egress to public internet: $0.02 / GB above 10 GB free per month
At rest: $0.015 / GB / month, no minimums

Network volumes

Network volumes are persistent GPU-side storage. They mount at /workspace on Pods and /runpod-volume in Serverless workers. Create one on the Storage page, then attach it when creating a Pod or Deployment.

curl -X POST https://hypereal.cloud/v1/gpu/volumes \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "models-us-wa-1",
    "sizeGb": 100,
    "dataCenterId": "US-WA-1"
  }'

GET /v1/gpu/volumes lists network volumes.
PATCH /v1/gpu/volumes/:id renames or expands a volume. Volumes cannot be shrunk.
DELETE /v1/gpu/volumes/:id deletes the upstream persistent volume.

Handler spec

Your Docker image must expose an HTTP handler on the port declared in ports (default 8000). Requests arrive as:

POST /run HTTP/1.1
Content-Type: application/json

{ "input": { ...your payload... } }

Respond synchronously with JSON:

{ "output": { ...your result... } }

Minimal echo handler (Python)

10-line worker that returns whatever you sent. Useful for verifying your deployment connects end-to-end before swapping in a real model.

# handler.py
import runpod

def handler(event):
    return {"echo": event.get("input", {})}

runpod.serverless.start({"handler": handler})

# Dockerfile
FROM python:3.11-slim
RUN pip install --no-cache-dir runpod
COPY handler.py /handler.py
CMD ["python", "-u", "/handler.py"]

Build, push to any container registry (GHCR / Docker Hub / GCR / ECR), then point a new deployment at the image:

docker build -t ghcr.io/you/hypereal-echo:v1 .
docker push ghcr.io/you/hypereal-echo:v1

curl -X POST https://hypereal.cloud/v1/deployments \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "echo",
    "name": "Echo worker",
    "dockerImage": "ghcr.io/you/hypereal-echo:v1",
    "gpuTypes": "AMPERE_16"
  }'

Anything more elaborate (vLLM, ComfyUI, Flux LoRA) follows the same shape — replace the handler body with your model code. Runpod-compatible handlers are supported, so existing serverless workers can be moved over with minimal changes.

Quick start

1. Provision a deployment

curl -X POST https://hypereal.cloud/v1/deployments \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "my-flux-lora",
    "name": "My Flux LoRA",
    "dockerImage": "ghcr.io/me/flux-lora-handler:latest",
    "gpuTypes": "ADA_48_PRO,AMPERE_80",
    "workersMin": 0,
    "workersMax": 3,
    "idleTimeoutSeconds": 5
  }'

# gpuTypes accepts one or more comma-separated GPU pool IDs:
#   AMPERE_16, AMPERE_24, ADA_24, AMPERE_48, ADA_48_PRO,
#   AMPERE_80, ADA_80_PRO, HOPPER_141, ADA_32_PRO,
#   BLACKWELL_96, BLACKWELL_180

2. Run a job (async, with webhook)

curl -X POST https://hypereal.cloud/v1/gpu/run/my-flux-lora \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"input": {"prompt": "a cat astronaut"}}'

# => { "job_id": "abc...", "status": "queued" }

3. Poll, or wait for the webhook

curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/gpu/jobs/abc...

# => { "status": "succeeded", "output": {...}, "executionMs": 1840 }

4. Rent a dedicated GPU pod

curl https://hypereal.cloud/v1/gpu/pods/types \
  -H "Authorization: Bearer ck_..."
# pick a "id" from the response, then:

curl -X POST https://hypereal.cloud/v1/gpu/pods \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "training-rig-01",
    "gpuTypeId": "NVIDIA H100 80GB HBM3",
    "dockerImage": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04",
    "containerDiskGb": 80
  }'

5. Deploy training

# Single-node pretraining pod
curl -X POST https://hypereal.cloud/v1/training/pretrain \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"name":"continued-pretrain-01","gpuTypeId":"NVIDIA H100 80GB HBM3","dockerImage":"nvcr.io/nvidia/pytorch:24.10-py3"}'

# LoRA post-training
curl -X POST https://hypereal.cloud/v1/training/jobs \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"name":"brand-lora","baseModel":"flux-dev-lora","datasetObjectId":"sto_..."}'

6. Request GPU cluster capacity

mailto:sales[at]hypereal.cloud

Subject: GPU Cluster Capacity

Include GPU type/count, preferred region, expected runtime,
networking needs, and whether you need Slurm, Kubernetes,
or a private endpoint.

(replace [at] with @ — obfuscated to slow down email scrapers.)

7. Upload model weights

# Step A: get a signed PUT URL
curl -X POST https://hypereal.cloud/v1/storage/upload \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"filename": "lora.safetensors", "contentType": "application/octet-stream", "kind": "lora"}'

# => { "id": "...", "uploadUrl": "https://...", "publicUrl": "..." }

# Step B: PUT directly to the signed URL
curl -X PUT "<uploadUrl>" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @lora.safetensors

# Step C: confirm
curl -X POST https://hypereal.cloud/v1/storage/commit \
  -H "Authorization: Bearer ck_..." \
  -d '{"id": "..."}'

Webhooks

When you submit a job in async mode, Hypereal calls your webhook_url (if provided in the request body) on terminal status. The webhook delivers:

POST <your_webhook_url>?secret=<hypereal_signed_secret>
Content-Type: application/json

{
  "job_id": "abc...",
  "status": "succeeded",
  "output": { ... },
  "executionMs": 1840
}

Need 100+ GPUs, custom regions, SLA, SAML?

We do enterprise commits with dedicated capacity, private network volumes, and committed-use discounts of 30–60%. Email sales[at]hypereal.cloud and we'll set it up.