Hypereal Infrastructure
Deploy any model on real GPUs. Rent dedicated H100/A100 by the hour. Store weights and datasets. One API key, one bill, no DevOps.
Core surfaces
- ·Deployments — provide a Docker image, get a Hypereal-managed auto-scaling GPU endpoint. Pay per second of compute.
POST /v1/deployments - ·GPU Pods — dedicated hourly instances (H100, A100, L40S, A6000). SSH, public IP, persistent disk.
POST /v1/gpu/pods - ·Storage — S3-compatible object store for model weights, LoRAs, datasets, generated outputs. Signed direct-PUT, no body-size limits.
POST /v1/storage/upload - ·Jobs — every invocation logged with latency, output, credit cost. Async webhook delivery or sync runs.
Authentication
All Infrastructure endpoints accept either a session cookie (dashboard) or a Bearer API key for SDK / CLI use.
curl -H "Authorization: Bearer ck_..." \ https://hypereal.cloud/v1/gpu/pods/types
Generate keys at /manage-api-keys.
Pricing
Custom deployments (per-second)
Charged by GPU-second while a worker is actively executing your handler. Idle workers (kept warm to reduce cold-start) are not billed beyond your idleTimeout setting.
| GPU tier | USD / sec | Credits / sec | USD / hr |
|---|---|---|---|
| A4000 / A5000 (budget) | $0.0001 | 0.01 | $0.36 |
| A6000 / 6000 Ada (mid) | $0.0003 | 0.03 | $1.08 |
| A100 80GB | $0.0008 | 0.08 | $2.88 |
| H100 80GB | $0.0014 | 0.14 | $5.04 |
100 credits = $1 USD. Prices are 25–50% below the leading public clouds at equivalent tiers. Custom commit pricing available above $5k/month — contact us.
Dedicated GPU pods (hourly)
Live pricing — current rates are shown on /infra/pods/new. First hour is pre-charged; auto-bills hourly while running. Terminate to stop the meter.
Storage
- Egress within Hypereal: free (deployments → storage → jobs)
- Egress to public internet: $0.02 / GB above 10 GB free per month
- At rest: $0.015 / GB / month, no minimums
Handler spec
Your Docker image must expose an HTTP handler on the port declared in ports (default 8000). Requests arrive as:
POST /run HTTP/1.1
Content-Type: application/json
{ "input": { ...your payload... } }Respond synchronously with JSON:
{ "output": { ...your result... } }Minimal echo handler (Python)
10-line worker that returns whatever you sent. Useful for verifying your deployment connects end-to-end before swapping in a real model.
# handler.py
import runpod
def handler(event):
return {"echo": event.get("input", {})}
runpod.serverless.start({"handler": handler})# Dockerfile FROM python:3.11-slim RUN pip install --no-cache-dir runpod COPY handler.py /handler.py CMD ["python", "-u", "/handler.py"]
Build, push to any container registry (GHCR / Docker Hub / GCR / ECR), then point a new deployment at the image:
docker build -t ghcr.io/you/hypereal-echo:v1 .
docker push ghcr.io/you/hypereal-echo:v1
curl -X POST https://hypereal.cloud/v1/deployments \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"slug": "echo",
"name": "Echo worker",
"dockerImage": "ghcr.io/you/hypereal-echo:v1",
"gpuTypes": "AMPERE_16"
}'Anything more elaborate (vLLM, ComfyUI, Flux LoRA) follows the same shape — replace the handler body with your model code. The runpod Python package handles the HTTP server, queue plumbing, and graceful shutdown for you.
Quick start
1. Provision a deployment
curl -X POST https://hypereal.cloud/v1/deployments \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"slug": "my-flux-lora",
"name": "My Flux LoRA",
"dockerImage": "ghcr.io/me/flux-lora-handler:latest",
"gpuTypes": "ADA_48_PRO,AMPERE_80",
"workersMin": 0,
"workersMax": 3,
"idleTimeoutSeconds": 5
}'
# gpuTypes accepts one or more comma-separated GPU pool IDs:
# AMPERE_16, AMPERE_24, ADA_24, AMPERE_48, ADA_48_PRO,
# AMPERE_80, ADA_80_PRO, HOPPER_141, ADA_32_PRO,
# BLACKWELL_96, BLACKWELL_1802. Run a job (async, with webhook)
curl -X POST https://hypereal.cloud/v1/gpu/run/my-flux-lora \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{"input": {"prompt": "a cat astronaut"}}'
# => { "job_id": "abc...", "status": "queued" }3. Poll, or wait for the webhook
curl -H "Authorization: Bearer ck_..." \
https://hypereal.cloud/v1/gpu/jobs/abc...
# => { "status": "succeeded", "output": {...}, "executionMs": 1840 }4. Rent a dedicated GPU pod
curl https://hypereal.cloud/v1/gpu/pods/types \
-H "Authorization: Bearer ck_..."
# pick a "id" from the response, then:
curl -X POST https://hypereal.cloud/v1/gpu/pods \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{
"name": "training-rig-01",
"gpuTypeId": "NVIDIA H100 80GB HBM3",
"dockerImage": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04",
"containerDiskGb": 80
}'5. Upload model weights
# Step A: get a signed PUT URL
curl -X POST https://hypereal.cloud/v1/storage/upload \
-H "Authorization: Bearer ck_..." \
-H "Content-Type: application/json" \
-d '{"filename": "lora.safetensors", "contentType": "application/octet-stream", "kind": "lora"}'
# => { "id": "...", "uploadUrl": "https://...", "publicUrl": "..." }
# Step B: PUT directly to the signed URL
curl -X PUT "<uploadUrl>" \
-H "Content-Type: application/octet-stream" \
--data-binary @lora.safetensors
# Step C: confirm
curl -X POST https://hypereal.cloud/v1/storage/commit \
-H "Authorization: Bearer ck_..." \
-d '{"id": "..."}'Webhooks
When you submit a job in async mode, Hypereal calls your webhook_url (if provided in the request body) on terminal status. The webhook delivers:
POST <your_webhook_url>?secret=<hypereal_signed_secret>
Content-Type: application/json
{
"job_id": "abc...",
"status": "succeeded",
"output": { ... },
"executionMs": 1840
}