Despliega cualquier modelo en GPUs reales
Cualquier imagen Docker — vLLM, servidores de inferencia HuggingFace, workers de ComfyUI, tu propio handler. Endpoints con autoescala desde $0.36/hora equivalente.
Precios por nivel de GPU
You only pay for GPU-seconds while your handler is executing. Idle workers cost nothing — they scale to zero after the timeout you set on the deployment. Hourly equivalent is shown for reference at 100% utilisation.
| Tier | GPUs | / sec | / hr | Notes |
|---|---|---|---|---|
| Budget | A4000 · A5000 | $0.0001 | $0.36 | 16 GB · light SDXL, embeddings, classification |
| Mid | A6000 · 6000 Ada | $0.0003 | $1.08 | 48 GB · Flux, SDXL XL pipelines, mid-LLMs |
| High | A100 80 GB | $0.0008 | $2.88 | 80 GB · vLLM 70B fp8, fast video, training |
| Top | H100 80 GB | $0.0014 | $5.04 | HBM3 · production LLM serving, long-context vLLM |
H200 / B200 / Hopper-141 / Blackwell-96 available on request via enterprise.
Cinco líneas de Python. GPUs reales.
# handler.py
import runpod
def run(job):
payload = job["input"]
# Your inference code here.
return {"output": f"hello {payload.get('name', 'world')}"}
runpod.serverless.start({"handler": run})FROM python:3.11-slim RUN pip install runpod COPY handler.py /handler.py CMD ["python", "-u", "/handler.py"]
curl https://hypereal.cloud/v1/deployments \
-H "Authorization: Bearer $HYPEREAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"slug": "hello-world",
"name": "Hello World",
"dockerImage": "ghcr.io/your-org/hello-world:latest",
"gpuPoolTier": "AMPERE_16",
"workersMin": 0,
"workersMax": 3,
"idleTimeoutSeconds": 30
}'Once live, every request to POST /v1/gpu/run/hello-world is routed to a warm worker, billed per second, and delivered sync or via webhook.
Qué puedes desplegar
Any Hugging Face model
vLLM, TGI, Triton Inference Server, llama.cpp — pull weights at boot or bake them into the image. We give you a stable URL and let your handler do the rest.
Your own Docker handler
A `handler.py` and a 4-line Dockerfile is the whole API. No SDK to learn beyond `runpod.serverless.start`. Bring custom CUDA kernels, custom tokenizers, multi-stage pipelines.
Open-source workers
ComfyUI worker, Whisper-large, Bark, Mochi, Wan 2.2, OmniGen — anything packaged as an OCI image with a serverless handler runs unmodified on Hypereal.

