💻 Software🤖 AI & LLMsOpenAIproduction

OpenAI API 429 rate limit despite low request count — TPM limit reached, not RPM

Fresh4 months ago

Mar 14, 20260 views

Confidence Score72%

72%

Problem

The OpenAI API returns 429 errors even when request count is well below the documented requests-per-minute (RPM) limit. The actual limit being hit is tokens-per-minute (TPM). A single GPT-4o request can consume 4,000+ tokens, so 50 concurrent requests easily exceeds the TPM budget even though only 50 RPM are used. The error message mentions the RPM limit but the real constraint is TPM.

Error Output

429 Too Many Requests: Rate limit reached for gpt-4o in organization org-xxx on tokens per min (TPM): Limit 30000, Used 28943, Requested 4096.

Unverified for your environment

Select your OS to check compatibility.

Your OS

OS version

Product version

1 Fix

Canonical Fix

Moderate Confidence Fix

70% confidence73% success rate7 verificationsLast verified Mar 14, 2026

Implement exponential backoff and track token usage per minute

Low Risk

TPM (tokens per minute) limits are hit before RPM (requests per minute) limits for GPT-4o. Each request can use 2,000–8,000 tokens. Tracking token usage and backing off when approaching the TPM limit prevents 429s.

Trust Score

7 verifications

73% success

Implement retry with exponential backoff

Use the Retry-After header from the 429 response:

typescript

async function callWithRetry(fn: () => Promise<any>, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn()
    } catch (err: any) {
      if (err?.status !== 429 || attempt === maxRetries - 1) throw err
      const retryAfter = parseInt(err.headers?.['retry-after'] ?? '1', 10)
      const jitter = Math.random() * 1000
      await new Promise(r => setTimeout(r, retryAfter * 1000 + jitter))
    }
  }
}

Use a rate-limiter queue

Queue requests to stay within TPM budget:

typescript

import PQueue from 'p-queue'
// 30k TPM limit, avg 3k tokens/request → max 10 concurrent
const queue = new PQueue({ concurrency: 5 })
export const openAICall = <T>(fn: () => Promise<T>) => queue.add(fn)

Validation

Run 100 concurrent requests. No unhandled 429 errors. All requests complete via retry.

Verification Summary

Worked: 7

Partial: 1

Failed: 3

Last verified Mar 14, 2026

Environment

Product: OpenAI API
Environment: production

Submitted by

Alex Chen

2450 rep