Infrastructure

Hypereal Infrastructure

Deploy any model on real GPUs. Rent dedicated H100/A100 by the hour. Store weights and datasets. One API key, one bill, no DevOps.

Core surfaces

·Deployments — provide a Docker image, get a Hypereal-managed auto-scaling GPU endpoint. Pay per second of compute. POST /v1/deployments
·GPU Pods — dedicated hourly instances (H100, A100, L40S, A6000). SSH, public IP, persistent disk. POST /v1/gpu/pods
·Storage — S3-compatible object store for model weights, LoRAs, datasets, generated outputs. Signed direct-PUT, no body-size limits. POST /v1/storage/upload
·Jobs — every invocation logged with latency, output, credit cost. Async webhook delivery or sync runs.

Authentication

All Infrastructure endpoints accept either a session cookie (dashboard) or a Bearer API key for SDK / CLI use.

curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/gpu/pods/types

Generate keys at /manage-api-keys.

Pricing

Custom deployments (per-second)

Charged by GPU-second while a worker is actively executing your handler. Idle workers (kept warm to reduce cold-start) are not billed beyond your idleTimeout setting.

GPU tier	USD / sec	Credits / sec	USD / hr
A4000 / A5000 (budget)	$0.0001	0.01	$0.36
A6000 / 6000 Ada (mid)	$0.0003	0.03	$1.08
A100 80GB	$0.0008	0.08	$2.88
H100 80GB	$0.0014	0.14	$5.04

100 credits = $1 USD. Prices are 25–50% below the leading public clouds at equivalent tiers. Custom commit pricing available above $5k/month — contact us.

Dedicated GPU pods (hourly)

Live pricing — current rates are shown on /infra/pods/new. First hour is pre-charged; auto-bills hourly while running. Terminate to stop the meter.

Storage

Egress within Hypereal: free (deployments → storage → jobs)
Egress to public internet: $0.02 / GB above 10 GB free per month
At rest: $0.015 / GB / month, no minimums

Handler spec

Your Docker image must expose an HTTP handler on the port declared in ports (default 8000). Requests arrive as:

POST /run HTTP/1.1
Content-Type: application/json

{ "input": { ...your payload... } }

Respond synchronously with JSON:

{ "output": { ...your result... } }

Minimal echo handler (Python)

10-line worker that returns whatever you sent. Useful for verifying your deployment connects end-to-end before swapping in a real model.

# handler.py
import runpod

def handler(event):
    return {"echo": event.get("input", {})}

runpod.serverless.start({"handler": handler})

# Dockerfile
FROM python:3.11-slim
RUN pip install --no-cache-dir runpod
COPY handler.py /handler.py
CMD ["python", "-u", "/handler.py"]

Build, push to any container registry (GHCR / Docker Hub / GCR / ECR), then point a new deployment at the image:

docker build -t ghcr.io/you/hypereal-echo:v1 .
docker push ghcr.io/you/hypereal-echo:v1

curl -X POST https://hypereal.cloud/v1/deployments \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "echo",
    "name": "Echo worker",
    "dockerImage": "ghcr.io/you/hypereal-echo:v1",
    "gpuTypes": "AMPERE_16"
  }'

Anything more elaborate (vLLM, ComfyUI, Flux LoRA) follows the same shape — replace the handler body with your model code. The runpod Python package handles the HTTP server, queue plumbing, and graceful shutdown for you.

Quick start

1. Provision a deployment

curl -X POST https://hypereal.cloud/v1/deployments \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "my-flux-lora",
    "name": "My Flux LoRA",
    "dockerImage": "ghcr.io/me/flux-lora-handler:latest",
    "gpuTypes": "ADA_48_PRO,AMPERE_80",
    "workersMin": 0,
    "workersMax": 3,
    "idleTimeoutSeconds": 5
  }'

# gpuTypes accepts one or more comma-separated GPU pool IDs:
#   AMPERE_16, AMPERE_24, ADA_24, AMPERE_48, ADA_48_PRO,
#   AMPERE_80, ADA_80_PRO, HOPPER_141, ADA_32_PRO,
#   BLACKWELL_96, BLACKWELL_180

2. Run a job (async, with webhook)

curl -X POST https://hypereal.cloud/v1/gpu/run/my-flux-lora \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"input": {"prompt": "a cat astronaut"}}'

# => { "job_id": "abc...", "status": "queued" }

3. Poll, or wait for the webhook

curl -H "Authorization: Bearer ck_..." \
  https://hypereal.cloud/v1/gpu/jobs/abc...

# => { "status": "succeeded", "output": {...}, "executionMs": 1840 }

4. Rent a dedicated GPU pod

curl https://hypereal.cloud/v1/gpu/pods/types \
  -H "Authorization: Bearer ck_..."
# pick a "id" from the response, then:

curl -X POST https://hypereal.cloud/v1/gpu/pods \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "training-rig-01",
    "gpuTypeId": "NVIDIA H100 80GB HBM3",
    "dockerImage": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04",
    "containerDiskGb": 80
  }'

5. Upload model weights

# Step A: get a signed PUT URL
curl -X POST https://hypereal.cloud/v1/storage/upload \
  -H "Authorization: Bearer ck_..." \
  -H "Content-Type: application/json" \
  -d '{"filename": "lora.safetensors", "contentType": "application/octet-stream", "kind": "lora"}'

# => { "id": "...", "uploadUrl": "https://...", "publicUrl": "..." }

# Step B: PUT directly to the signed URL
curl -X PUT "<uploadUrl>" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @lora.safetensors

# Step C: confirm
curl -X POST https://hypereal.cloud/v1/storage/commit \
  -H "Authorization: Bearer ck_..." \
  -d '{"id": "..."}'

Webhooks

When you submit a job in async mode, Hypereal calls your webhook_url (if provided in the request body) on terminal status. The webhook delivers:

POST <your_webhook_url>?secret=<hypereal_signed_secret>
Content-Type: application/json

{
  "job_id": "abc...",
  "status": "succeeded",
  "output": { ... },
  "executionMs": 1840
}

Need 100+ GPUs, custom regions, SLA, SAML?

We do enterprise commits with dedicated capacity, private network volumes, and committed-use discounts of 30–60%. Reply to your last invoice email and we'll set it up.