Rate Limits

Rate limits and quota are not the same thing.

Rate limits

Rate limits protect the system by controlling request intensity over time.

As of 2026-06-04, the current implementation applies a default limit of 30 requests per minute per credential bucket. For API-key requests, that bucket is the API key. For OAuth Remote MCP requests, that bucket is derived from OAuth audit identity such as client and token metadata. When this protection is triggered, the public error code is RATE_LIMITED.

Remote MCP uses the same rate-limit mechanism, whether the deployment uses API-key mode, OAuth mode, or dual mode.

Quota

Quota explains what an account can use during a billing or entitlement cycle. It may include both:

Skill quota
Underlying service quota

Important takeaway

You can have quota remaining and still hit request frequency protection, or stay below rate limits and still be blocked by quota.

Practical recovery

Wait about one minute before retrying the same API key
Reduce burst retries from your Agent or workflow
Treat Cloudflare or proxy connectivity failures as network issues, not as rate-limit events
For automation, use CLI exit codes and avoid immediate tight retry loops