The 429 error — “too many requests” — is the most common production issue with the Claude API. It’s also the one that surprises people most, because you can hit it with plenty of unused credit and a perfectly valid API key.
This post explains how Anthropic’s rate limits actually work, why you hit them, and the practical patterns that avoid them in production Make.com scenarios.
What a 429 Means
When you make an API call to Claude, Anthropic’s servers check three things:
- Requests per minute (RPM) — how many API calls you’ve made in the last 60 seconds
- Input tokens per minute (ITPM) — how much text you’ve sent in the last 60 seconds
- Output tokens per minute (OTPM) — how much text Claude has returned in the last 60 seconds
Exceed any of the three, and you get a 429 response:
{
"type": "error",
"error": {
"type": "rate_limit_error",
"message": "This request would exceed your organization's rate limit..."
}
}
The API call fails. Nothing was processed. You weren’t charged. But your scenario stopped.
Anthropic’s Tier System
Your rate limits depend on your tier — not your plan, not your credit balance, but your account’s usage history. Higher tiers = higher limits.
As of April 2026, the tiers for Claude Sonnet 4.6 (approximate):
| Tier | RPM | ITPM | OTPM | How to reach |
|---|---|---|---|---|
| Tier 1 | ~50 | ~40k | ~8k | New account with $5 credit |
| Tier 2 | ~1,000 | ~80k | ~16k | Spend $40+ and account is 7+ days old |
| Tier 3 | ~2,000 | ~160k | ~32k | Spend $200+ and account is 7+ days old |
| Tier 4 | ~4,000 | ~400k | ~80k | Spend $400+ and account is 14+ days old |
(Exact limits vary by model and change — check docs.anthropic.com/en/api/rate-limits for current numbers.)
Important: limits are per-model. Haiku has its own limits, Sonnet has its own, Opus has its own. Using multiple models doesn’t share the limit pool.
Tier progression is automatic. You don’t apply. You just use the API, spend money, and your limits rise.
Why You Hit Limits in Make.com Specifically
Make.com scenarios have a specific failure mode: batch processing.
Full Implementation Blueprint — $29
The Blueprint course walks through production-ready Make.com + Claude + Gemini + Perplexity scenarios end-to-end. Real templates, real error handling, real costs.
If your scenario triggers on 100 new emails at once (e.g. after you’ve been offline and they queued up), Make will process them sequentially. By default it might send all 100 to Claude in under a minute. That’s 100 RPM — Tier 1 cuts you off at 50.
Same problem with Iterator modules processing large arrays, or with scenarios that fire on webhooks with high throughput.
Ninety percent of the 429 errors I see are variants of this pattern.
Five Fixes (In Order of Effort)
1. Set Max Tokens on Every Module
This doesn’t prevent RPM errors, but it prevents OTPM errors. Without Max Tokens, Claude can generate 4,000+ output tokens when you needed 200. That’s 20x the output token cost and a much faster path to OTPM limits.
Every Claude module in Make.com has a Max Tokens field. Set it. I use:
- 256 for classification / extraction
- 512 for short replies
- 1024 for medium responses
- 2048+ only when I explicitly need long output
2. Use a Lighter Model for Bulk Work
Haiku has higher throughput limits than Sonnet or Opus on Tier 1 — more RPM at the same tier. If you’re processing a large batch, routing through Haiku first (for classification/filtering) before hitting Sonnet only on the cases that matter saves both rate limit and cost.
3. Add Delay Between Iterations
In Make.com, if you’re processing a list with an Iterator, insert a Sleep module after the Claude call inside the loop. A 2-second sleep caps you at 30 RPM — comfortably under Tier 1’s 50.
Find Sleep: add module → Tools → Sleep → 2 seconds.
It feels slow, but running reliably at 30 RPM beats failing at 80 RPM.
4. Add Error Handling with Retries
For transient 429s, automatic retry fixes them:
- Right-click the Claude module → Add error handler
- Pick Break
- Configure: Number of attempts: 3, Interval: 30 seconds
When Claude returns 429, Make waits 30 seconds, tries again. Usually works on the second attempt because the 1-minute window has rolled.
Exponential backoff is fancier but Break with fixed interval works fine for most cases.
5. Upgrade Your Anthropic Tier
The real fix for sustained high-volume workflows is just having higher limits. Tier progression happens automatically as you spend, but if you need it faster, Anthropic’s enterprise pricing page has a contact form for custom tier requests.
For most small-to-medium businesses, Tier 2 (reached after spending ~$40) is enough headroom that 429s become rare.
Prompt Caching — The Sneaky Solution
Anthropic offers prompt caching: if you’re using a long system prompt across multiple calls, you can cache it and subsequent calls use the cached version at 10% of the input cost.
This also helps with rate limits: cached input tokens don’t count toward your ITPM limit.
For a customer service bot with a 2,000-token system prompt handling 100 messages/minute:
- Without caching: 200,000 ITPM (way over Tier 1’s 40k limit)
- With caching: 0 cached + ~200 per user message = 20,000 ITPM (fine on Tier 1)
Enable it by adding "cache_control": {"type": "ephemeral"} to your system prompt in the API call. Make.com’s Claude module exposes this in advanced options as of the 2026 update.
Reading the 429 Response Header
When Claude returns a 429, it includes a retry-after header telling you exactly how many seconds to wait. Well-built retry logic reads this header and waits that long before retrying.
Make.com’s Break error handler doesn’t read headers automatically, but you can build custom retry logic with HTTP modules + Sleep + a Router if you’re hitting lots of 429s. Usually overkill — the fixed 30-second wait is fine.
Monitoring Rate Limit Usage
In the Anthropic Console → Usage tab, there’s a “Rate limits” section showing your current usage vs limits for each model. Check it weekly. If you’re running close to the ceiling regularly, that’s a signal to upgrade your tier or re-architect (route to lighter models, add caching, add batch delays).
What NOT to Do
Don’t spam retry immediately. Some people, seeing a 429, add a loop that retries 20 times with no delay. This makes it worse — you’re now flooding the API with the same requests you couldn’t afford in the first place. Anthropic may temporarily suspend your organisation for abusive patterns.
Don’t rotate API keys. Creating multiple keys under the same Anthropic organisation does not bypass rate limits. Limits are org-wide, not key-wide.
Don’t assume the error is your code. 429s are a signal to architect, not to debug. If you’re hitting them, it means your workflow is running hot — that’s a scaling problem, not a bug.
TL;DR
- Set Max Tokens on every Claude module
- Use Haiku for bulk work, Sonnet only when quality demands
- Add Sleep modules inside loops
- Add Break error handlers with 3 retries at 30 seconds
- Use prompt caching for long system prompts
- Let your tier progress automatically as you spend
Ninety percent of rate limit issues are solved by the first four.
Next Steps
If you want the full error-handling playbook — rate limits, API errors, Claude giving bad output, network timeouts, all of it — the Implementation Blueprint course ($29) covers production error handling in depth, including copy-paste Break configurations for common scenarios.
Last updated: 20 April 2026. Rate limits and tier thresholds change as Anthropic evolves their infrastructure — verify current numbers on Anthropic’s docs before architecting.
Full Implementation Blueprint — $29
The Blueprint course walks through production-ready Make.com + Claude + Gemini + Perplexity scenarios end-to-end. Real templates, real error handling, real costs.