Content Moderation API: Detect NSFW & Unsafe Content in 2026
Keep your generative AI pipeline safe without slowing it down

Generative AI pipelines ship fast. Safety layers often don't. If your app lets users submit free-form text or generates images on demand, you need a reliable content moderation API sitting in the hot path — one that catches NSFW material, hate speech, and policy violations before they reach storage, other users, or a compliance audit. This guide covers the concepts, the options, and the practical code to get it wired up.
What is a content moderation API
A content moderation API is an endpoint you call with a piece of content — text, image URL, or base64 payload — and receive back a structured judgment: safe or not, and why. The response typically includes category labels (sexual, violent, self-harm, hate speech, spam) and confidence scores per category, so you can tune your own threshold rather than accepting a hard binary.
In a generative pipeline there are two places to apply it:
- Ingress (user input): Check the prompt before you ever forward it to a model. Blocks prompt-injection attacks and policy-violating requests before they cost you a single API credit.
- Egress (model output): Check the generated image or text before you persist it or return it to the end user. Catches the cases where a compliant-looking prompt still produces unsafe output.
Both gates together give you defense-in-depth. Either gate alone leaves a hole.
Best content moderation API 2026
There are a handful of serious options in 2026:
| Option | Modality | Notes |
|---|---|---|
OpenAI Moderation (omni-moderation-latest) |
Text + image | Free with an OpenAI key; solid coverage across 11+ categories |
| AWS Rekognition | Image + video | Strong for visual nudity/violence; no native text |
| Google Cloud Vision SafeSearch | Image | Five-label scale; fast and cheap at volume |
| Azure AI Content Safety | Text + image | Fine-grained category scores; enterprise SLA |
| Open-source (NudeNet, Detoxify) | Depends | Self-hosted; no latency overhead; maintenance burden |
For teams already running on a unified AI gateway: the easiest path is to call the OpenAI-compatible moderation endpoint through Hypereal, keep the same auth header and base URL as the rest of your pipeline, and pay a fraction of the official rate. No separate account, no second set of credentials.
Hypereal's API base URL is https://api.hypereal.cloud/v1 — the same endpoint you use for image generation and LLM calls. Pricing for moderation calls is a fraction of official provider rates; check hypereal.cloud for live numbers.
NSFW detection with a content moderation API
NSFW detection is the most common use case — especially for apps that let users upload avatars, generate product images, or feed content into a social feed.
Most moderation APIs return a score per category. A typical response for an image check looks like:
{
"id": "modr-abc123",
"results": [
{
"flagged": false,
"categories": {
"sexual": false,
"sexual/minors": false,
"violence": false,
"hate": false,
"self-harm": false
},
"category_scores": {
"sexual": 0.04,
"violence": 0.01,
"hate": 0.00
}
}
]
}
A flagged: true on sexual with category_scores.sexual > 0.7 is a reliable soft-block threshold for most consumer apps. You can tune this: stricter for under-18 audiences, more lenient for adult platforms that require age verification.
Common pitfall: using flagged as a hard gate without checking the raw scores. The default flagged threshold is conservative. If you're rejecting content at too high a rate, read the raw scores and set your own threshold.
How to add a content moderation API to your pipeline
Here is a complete example. It calls the Hypereal-proxied moderation endpoint to check a user's text prompt, then only fires the image generation if the prompt is clean.
cURL (quick test):
curl -X POST https://api.hypereal.cloud/v1/moderations \
-H "Authorization: Bearer $HYPEREAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "omni-moderation-latest",
"input": "A sunny beach with kids playing volleyball"
}'
Python (production pattern):
import os
import httpx
HYPEREAL_BASE = "https://api.hypereal.cloud/v1"
HEADERS = {
"Authorization": f"Bearer {os.environ['HYPEREAL_API_KEY']}",
"Content-Type": "application/json",
}
def is_safe(text: str, threshold: float = 0.7) -> bool:
resp = httpx.post(
f"{HYPEREAL_BASE}/moderations",
headers=HEADERS,
json={"model": "omni-moderation-latest", "input": text},
timeout=5,
)
resp.raise_for_status()
result = resp.json()["results"][0]
scores = result["category_scores"]
# Reject if any category score exceeds threshold
return not any(v >= threshold for v in scores.values())
def generate_image(prompt: str) -> dict:
if not is_safe(prompt):
raise ValueError("Prompt flagged by content moderation — request rejected.")
resp = httpx.post(
f"{HYPEREAL_BASE}/images/generate",
headers=HEADERS,
json={"model": "gpt-image-2", "prompt": prompt, "size": "1024x1024"},
timeout=60,
)
resp.raise_for_status()
return resp.json()
# Usage
image_data = generate_image("An oil painting of a mountain lake at sunrise")
This pattern adds roughly 150–300 ms of latency per request — fast enough for interactive products and cheap enough to run on every request.
Get set up in three steps:
- Sign up at hypereal.cloud
- Dashboard → API Keys → Create Key
export HYPEREAL_API_KEY=sk-...and drop the code above into your pipeline
FAQ
Is a content moderation API the same as a classifier? Functionally, yes — it's a classifier tuned for policy categories. The difference is that moderation APIs are pre-trained on policy-relevant labels (NSFW, hate, self-harm) rather than arbitrary classes, and they return calibrated scores rather than raw logits.
Should I moderate prompts, outputs, or both? Both, for any app that stores or surfaces generated content. Prompt moderation is cheaper (text is smaller than images); output moderation catches jailbreaks and unexpected model behavior. Skip either gate only if you have a clear reason.
Can I use Hypereal's moderation endpoint for image inputs?
Yes. The omni-moderation-latest model accepts both text and image URLs in the input field. Pass an array with {type: "image_url", image_url: {url: "..."}} items alongside your text.
What threshold should I use?
Start at 0.7 for general consumer apps. Move to 0.5 for stricter environments (schools, under-18 apps). For adult platforms where some content is permitted, inspect per-category scores and only block sexual/minors and self-harm unconditionally.
How does Hypereal price moderation calls? Moderation is billed in credits like every other call (100 credits = $1 USD). New accounts receive free trial credits — enough to test the full moderation + generation loop before spending anything. See hypereal.cloud for the current rate card.
Related Posts
Download Hypereal Agent
Run a local AI media workspace for image generation, video prompts, model selection, credit tracking, and saved artifacts.





