Rate limits
Niyra rate-limits per token using a sliding-window counter. Each token (OAuth access token or PAT) has its own budget — multiple tokens for the same user don't share a window.
Default limits
| Endpoint family | Limit | Window |
|---|---|---|
niyra_ask | 60 requests | 1 minute |
niyra_execute | 20 requests | 1 minute |
niyra_memories / niyra_remember | 120 requests | 1 minute |
niyra_get_task polling | 600 requests | 1 minute |
Alpha-plan users get 5× these limits. Pro users get 3×. Standard users get 1×.
Response headers
On every response, Niyra returns:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1717100123
Reset is a Unix timestamp — when the current window rolls over.
On a 429:
HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"error_description": "you've hit the per-token rate limit"
}
Retry-After is in seconds. Honor it — Niyra tracks repeated immediate retries as abuse signal.
Backoff pattern
import time, random, requests
def call_with_backoff(url, headers, json, max_tries=5):
for attempt in range(max_tries):
r = requests.post(url, headers=headers, json=json)
if r.status_code != 429:
return r
wait = int(r.headers.get("Retry-After", "5"))
# Add jitter so a fleet of workers doesn't synchronize.
time.sleep(wait + random.uniform(0, 1))
raise RuntimeError("rate limit retries exhausted")
Polling etiquette
For niyra_get_task:
- Minimum interval: 3 seconds. Anything faster will 429 you out before it speeds the result.
- Backoff: if the task has been running for 60+ seconds, drop to 10s polls. Most long-running tasks take 1–5 minutes.
- Cap: poll for at most 10 minutes. Beyond that, surface the task ID to the user so they can check the dashboard.
Burst behavior
The sliding window is not a token bucket — there's no burst credit. Sending 60 requests in the first second of a minute will exhaust your budget for the rest of the window. Spread requests across the window.