FG
🤖 AI & LLMsOpenAIproduction

OpenAI API 429 rate limit despite low request count — TPM limit reached, not RPM

Fresh22 days ago
Mar 14, 20260 views
Confidence Score72%
72%

Problem

The OpenAI API returns 429 errors even when request count is well below the documented requests-per-minute (RPM) limit. The actual limit being hit is tokens-per-minute (TPM). A single GPT-4o request can consume 4,000+ tokens, so 50 concurrent requests easily exceeds the TPM budget even though only 50 RPM are used. The error message mentions the RPM limit but the real constraint is TPM.

Error Output

429 Too Many Requests: Rate limit reached for gpt-4o in organization org-xxx on tokens per min (TPM): Limit 30000, Used 28943, Requested 4096.

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Moderate Confidence Fix
70% confidence73% success rate7 verificationsLast verified Mar 14, 2026

Implement exponential backoff and track token usage per minute

Low Risk

TPM (tokens per minute) limits are hit before RPM (requests per minute) limits for GPT-4o. Each request can use 2,000–8,000 tokens. Tracking token usage and backing off when approaching the TPM limit prevents 429s.

70

Trust Score

7 verifications

73% success
  1. 1

    Implement retry with exponential backoff

    Use the Retry-After header from the 429 response:

    typescript
    async function callWithRetry(fn: () => Promise<any>, maxRetries = 5) {
      for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
          return await fn()
        } catch (err: any) {
          if (err?.status !== 429 || attempt === maxRetries - 1) throw err
          const retryAfter = parseInt(err.headers?.['retry-after'] ?? '1', 10)
          const jitter = Math.random() * 1000
          await new Promise(r => setTimeout(r, retryAfter * 1000 + jitter))
        }
      }
    }
  2. 2

    Use a rate-limiter queue

    Queue requests to stay within TPM budget:

    typescript
    import PQueue from 'p-queue'
    // 30k TPM limit, avg 3k tokens/request → max 10 concurrent
    const queue = new PQueue({ concurrency: 5 })
    export const openAICall = <T>(fn: () => Promise<T>) => queue.add(fn)

Validation

Run 100 concurrent requests. No unhandled 429 errors. All requests complete via retry.

Verification Summary

Worked: 7
Partial: 1
Failed: 3
Last verified Mar 14, 2026

Sign in to verify this fix