10 Free OpenRouter LLM Models You Can Use Right Now (2026)

10 Free OpenRouter LLM Models You Can Use Right Now

OpenRouter aggregates 200+ LLMs behind one OpenAI-compatible API. Most cost money, but a steady roster of frontier-ish open-weight models are exposed at $0/token because providers (DeepSeek, Meta, Alibaba, Z.ai, NousResearch) subsidize them for promotional or research reasons.

This list is the 10 free models on OpenRouter as of May 2026 that are actually worth using — not the 100+ that exist but are slow, broken, or quota-zero. For each: strengths, where it breaks, and the model ID.

The free tier on OpenRouter is rate-limited (around 20 requests/minute, 200 requests/day per account at the time of writing). For heavier use, the section at the end shows how to swap to a paid OpenAI-compatible aggregator without changing your code.

1. `meta-llama/llama-4-maverick:free`

Meta's largest open Llama 4 variant — 405B parameters, MoE-routed. Best general-purpose free model. Good at code, multilingual reasoning, instruction following.

Best for: drop-in replacement for GPT-4-class quality on cost-sensitive workloads.
Breaks on: very long contexts (>128K tokens), heavy tool use.

2. `deepseek/deepseek-r2:free`

DeepSeek's reasoning model (released March 2026). Beats GPT-5-mini on math, competitive with Claude Sonnet 4.6 on code. Reasoning chains visible in the response.

Best for: math, code, multi-step reasoning where you want to see the thought trace.
Breaks on: short, conversational replies (over-thinks). Latency is high — multi-second TTFT.

3. `deepseek/deepseek-v3.2:free`

DeepSeek's non-reasoning generalist. Faster than R2, smaller context. Excellent value for chat and structured output.

Best for: high-volume chat, JSON output, function calling.
Breaks on: complex reasoning — escalate to R2.

4. `qwen/qwen-3-235b:free`

Alibaba's Qwen 3, 235B MoE. Strong multilingual (especially Chinese, Korean, Japanese). Surprisingly good at code.

Best for: anything non-English, multilingual fine-tuning data, Chinese tech use cases.
Breaks on: occasional Chinese-character bleed in English output. Re-roll.

5. `qwen/qwen-3-coder:free`

Code-specialized Qwen 3 fork. Punches above its weight on code completion and refactor. Good with tool use.

Best for: agentic coding loops on a budget.
Breaks on: prose, creative writing.

6. `z-ai/glm-4.7:free`

Zhipu's GLM-4.7. The cheapest viable Claude-Sonnet-class model in 2026. Surprisingly tight prompt adherence.

Best for: structured output, agent workflows where you want Claude-style behavior cheap.
Breaks on: very long English-language creative tasks.

7. `google/gemma-3-27b:free`

Google's open Gemma 3, 27B. Punches well above its parameter count — Google's distillation pipeline is genuinely state of the art.

Best for: edge deployment alternative, fast inference, RAG QA.
Breaks on: complex reasoning, code longer than ~200 lines.

8. `nousresearch/hermes-4-405b:free`

NousResearch's instruction-tuned Llama 4. The go-to fine-tune for character writing, roleplay, and creative tasks where Llama 4 base is too dry.

Best for: creative writing, character voice, roleplay, narrative generation.
Breaks on: code, math, structured output.

9. `microsoft/phi-4-mini:free`

Phi-4-mini, 14B. Microsoft's small-model line. Best free model in its size class for reasoning.

Best for: high-throughput, low-latency reasoning. Great for cheap embeddings-of-thought workflows.
Breaks on: long-context recall, anything requiring world knowledge.

10. `mistralai/mistral-large-3:free`

Mistral's Large 3 (free promotional tier on OpenRouter). Strong European-language performance, tight code completions.

Best for: European languages, function calling, coding.
Breaks on: free tier has the strictest rate limits — get throttled fast.

How to call them

OpenRouter uses an OpenAI-compatible endpoint. Standard SDK, prefix the model ID:

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)

response = client.chat.completions.create(
    model="deepseek/deepseek-r2:free",
    messages=[{"role": "user", "content": "Explain MoE routing in one paragraph."}],
)

When the free tier isn't enough

OpenRouter's free tier caps at ~20 RPM and ~200 requests/day. Real production work blows past that in an hour. When that happens you have two choices:

Pay for OpenRouter — same models, no rate cap, retail prices.
Move to a different OpenAI-compatible aggregator — same API shape, often substantially cheaper.

Hypereal sits in the second bucket. The exact model IDs differ, but the API shape is identical and we host most of the same open-weight models alongside premium ones (GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, NanoBanana 2, Seedance 2.0, GPT Image 2):

client = OpenAI(
    base_url="https://api.hypereal.cloud/v1",
    api_key="ck_...",
)

For most production workloads, moving from OpenRouter free → Hypereal works out cheaper than OpenRouter paid for the same throughput, with no daily cap.

FAQ

Are OpenRouter free models really free? Yes — providers cover the cost. The trade is: rate limits, occasional queue waits, and your prompts may be retained for model improvement (check each model's privacy line on OpenRouter).

Why are reasoning models like DeepSeek R2 free? Promotional. Providers want adoption signal and training data. Expect the policy to shift over time.

Can I use these commercially? Each model has its own license — Llama 4 (Llama community), Qwen (Apache-style), GLM (commercial-ok), Gemma (Gemma TOU). Check the model card.

Which one should I start with? Llama 4 Maverick for general work, DeepSeek R2 for hard reasoning, Hermes 4 for creative writing, Qwen 3 for multilingual.

Get started

OpenRouter's free tier is the fastest way to try ten frontier-ish models for $0. When you outgrow it, Hypereal is the cheapest paid path with the broadest model catalog — including the premium models OpenRouter charges full price for.