Custom GPU deployments

Despliega cualquier modelo en GPUs reales

Cualquier imagen Docker — vLLM, servidores de inferencia HuggingFace, workers de ComfyUI, tu propio handler. Endpoints con autoescala desde $0.36/hora equivalente.

Crear despliegue Ver precios

Per-second pricing

Precios por nivel de GPU

You only pay for GPU-seconds while your handler is executing. Idle workers cost nothing — they scale to zero after the timeout you set on the deployment. Hourly equivalent is shown for reference at 100% utilisation.

Tier	GPUs	/ sec	/ hr	Notes
Budget	A4000 · A5000	$0.0001	$0.36	16 GB · light SDXL, embeddings, classification
Mid	A6000 · 6000 Ada	$0.0003	$1.08	48 GB · Flux, SDXL XL pipelines, mid-LLMs
High	A100 80 GB	$0.0008	$2.88	80 GB · vLLM 70B fp8, fast video, training
Top	H100 80 GB	$0.0014	$5.04	HBM3 · production LLM serving, long-context vLLM

H200 / B200 / Hopper-141 / Blackwell-96 available on request via enterprise.

Minimal example

Cinco líneas de Python. GPUs reales.

handler.py

# handler.py
import runpod

def run(job):
    payload = job["input"]
    # Your inference code here.
    return {"output": f"hello {payload.get('name', 'world')}"}

runpod.serverless.start({"handler": run})

Dockerfile

FROM python:3.11-slim
RUN pip install runpod
COPY handler.py /handler.py
CMD ["python", "-u", "/handler.py"]

DeployPOST /v1/deployments

curl https://hypereal.cloud/v1/deployments \
  -H "Authorization: Bearer $HYPEREAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "hello-world",
    "name": "Hello World",
    "dockerImage": "ghcr.io/your-org/hello-world:latest",
    "gpuPoolTier": "AMPERE_16",
    "workersMin": 0,
    "workersMax": 3,
    "idleTimeoutSeconds": 30
  }'

Once live, every request to POST /v1/gpu/run/hello-world is routed to a warm worker, billed per second, and delivered sync or via webhook.

Workloads

Qué puedes desplegar

Any Hugging Face model

vLLM, TGI, Triton Inference Server, llama.cpp — pull weights at boot or bake them into the image. We give you a stable URL and let your handler do the rest.

Your own Docker handler

A `handler.py` and a 4-line Dockerfile is the whole API. No SDK to learn beyond `runpod.serverless.start`. Bring custom CUDA kernels, custom tokenizers, multi-stage pipelines.

Open-source workers

ComfyUI worker, Whisper-large, Bark, Mochi, Wan 2.2, OmniGen — anything packaged as an OCI image with a serverless handler runs unmodified on Hypereal.

Ready when you are

¿Listo para lanzar? Despliega tu contenedor ahora.

Desplegar contenedor