Are limits per-user or per-token?

Per-token. Each access token (OAuth) and each PAT has its own sliding window. A user can hold multiple tokens and spread load across them.

Are limits shared with the dashboard chat?

No. The dashboard chat path has its own per-user budget. API traffic competes only with other API traffic on the same token.

Rate limits

Per-token sliding-window budgets, headers Niyra returns, and how to back off cleanly.

Rate limits

Niyra rate-limits per token using a sliding-window counter. Each token (OAuth access token or PAT) has its own budget. Multiple tokens for the same user don't share a window.

Default limits

Endpoint family	Limit	Window
`niyra_ask`	60 requests	1 minute
`niyra_execute`	20 requests	1 minute
`niyra_memories` / `niyra_remember`	120 requests	1 minute
`niyra_get_task` polling	600 requests	1 minute

Alpha-plan users get 5× these limits. Pro users get 3×. Standard users get 1×.

Response headers

On every response, Niyra returns:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1717100123

Reset is a Unix timestamp, when the current window rolls over.

On a 429:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "error_description": "you've hit the per-token rate limit"
}

Retry-After is in seconds. Honor it. Niyra tracks repeated immediate retries as abuse signal.

Backoff pattern

import time, random, requests

def call_with_backoff(url, headers, json, max_tries=5):
    for attempt in range(max_tries):
        r = requests.post(url, headers=headers, json=json)
        if r.status_code != 429:
            return r
        wait = int(r.headers.get("Retry-After", "5"))
        # Add jitter so a fleet of workers doesn't synchronize.
        time.sleep(wait + random.uniform(0, 1))
    raise RuntimeError("rate limit retries exhausted")

Polling etiquette

For niyra_get_task:

Minimum interval: 3 seconds. Anything faster will 429 you out before it speeds the result.
Backoff: if the task has been running for 60+ seconds, drop to 10s polls. Most long-running tasks take 1–5 minutes.
Cap: poll for at most 10 minutes. Beyond that, surface the task ID to the user so they can check the dashboard.

Burst behavior

The sliding window is not a token bucket. There's no burst credit. Sending 60 requests in the first second of a minute will exhaust your budget for the rest of the window. Spread requests across the window.

FAQ

Are limits per-user or per-token?: Per-token. Each access token (OAuth) and each PAT has its own sliding window. A user can hold multiple tokens and spread load across them.
Are limits shared with the dashboard chat?: No. The dashboard chat path has its own per-user budget. API traffic competes only with other API traffic on the same token.

Rate limits

Rate limits

Default limits

Response headers

Backoff pattern

Polling etiquette

Burst behavior

Related

FAQ