Skip to content
← All docs

Rate limits

Per-token sliding-window budgets, headers Niyra returns, and how to back off cleanly.

Rate limits

Niyra rate-limits per token using a sliding-window counter. Each token (OAuth access token or PAT) has its own budget — multiple tokens for the same user don't share a window.

Default limits

Endpoint familyLimitWindow
niyra_ask60 requests1 minute
niyra_execute20 requests1 minute
niyra_memories / niyra_remember120 requests1 minute
niyra_get_task polling600 requests1 minute

Alpha-plan users get 5× these limits. Pro users get 3×. Standard users get 1×.

Response headers

On every response, Niyra returns:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1717100123

Reset is a Unix timestamp — when the current window rolls over.

On a 429:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "error_description": "you've hit the per-token rate limit"
}

Retry-After is in seconds. Honor it — Niyra tracks repeated immediate retries as abuse signal.

Backoff pattern

import time, random, requests

def call_with_backoff(url, headers, json, max_tries=5):
    for attempt in range(max_tries):
        r = requests.post(url, headers=headers, json=json)
        if r.status_code != 429:
            return r
        wait = int(r.headers.get("Retry-After", "5"))
        # Add jitter so a fleet of workers doesn't synchronize.
        time.sleep(wait + random.uniform(0, 1))
    raise RuntimeError("rate limit retries exhausted")

Polling etiquette

For niyra_get_task:

  • Minimum interval: 3 seconds. Anything faster will 429 you out before it speeds the result.
  • Backoff: if the task has been running for 60+ seconds, drop to 10s polls. Most long-running tasks take 1–5 minutes.
  • Cap: poll for at most 10 minutes. Beyond that, surface the task ID to the user so they can check the dashboard.

Burst behavior

The sliding window is not a token bucket — there's no burst credit. Sending 60 requests in the first second of a minute will exhaust your budget for the rest of the window. Spread requests across the window.

Related

FAQ

Are limits per-user or per-token?
Per-token. Each access token (OAuth) and each PAT has its own sliding window. A user can hold multiple tokens and spread load across them.
Are limits shared with the dashboard chat?
No. The dashboard chat path has its own per-user budget. API traffic competes only with other API traffic on the same token.
For AI:.md.txt