Rate Limits
Rate limits and quota are not the same thing.
Rate limits
Rate limits protect the system by controlling request intensity over time.
As of 2026-06-04, the current implementation applies a default limit of 30 requests per minute per credential bucket. For API-key requests, that bucket is the API key. For OAuth Remote MCP requests, that bucket is derived from OAuth audit identity such as client and token metadata. When this protection is triggered, the public error code is RATE_LIMITED.
Remote MCP uses the same rate-limit mechanism, whether the deployment uses API-key mode, OAuth mode, or dual mode.
Quota
Quota explains what an account can use during a billing or entitlement cycle. It may include both:
- Skill quota
- Underlying service quota
Important takeaway
You can have quota remaining and still hit request frequency protection, or stay below rate limits and still be blocked by quota.
Practical recovery
- Wait about one minute before retrying the same API key
- Reduce burst retries from your Agent or workflow
- Treat Cloudflare or proxy connectivity failures as network issues, not as rate-limit events
- For automation, use CLI exit codes and avoid immediate tight retry loops




