FG
💻 Software🤖 AI & LLMsOpenAI

Pass word_level_timestamps option into whisper API call

Fresh3 days ago
Mar 14, 20260 views
Confidence Score49%
49%

Problem

Describe the feature or improvement you're requesting When I use the Whisper model directly via the openai/whisper python package (https://github.com/openai/whisper) I am able to pass the `word_timestamps` into the transcription call. See the below snippet: [code block] However, when I make an API call, I'm not able to do this. I find this a very useful feature, as by default, the timestamps in the response are rounded to the nearest second, whereas when passing in `word_timestamps=True` they're much more accurate. Could the `word_timestamps` parameter be added to the API? Additional context _No response_

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Add word_timestamps Parameter to Whisper API Call

Medium Risk

The current Whisper API implementation does not expose the 'word_timestamps' parameter, which is available in the openai/whisper Python package. This results in less accurate timestamps in the transcription response, as they are rounded to the nearest second instead of providing precise word-level timestamps.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Modify API Specification

    Update the API specification to include the 'word_timestamps' parameter in the request body. This will allow users to specify whether they want word-level timestamps in their transcription results.

    json
    ```json
    {
      "model": "whisper",
      "audio": "<audio_file>",
      "word_timestamps": true
    }
    ```
  2. 2

    Implement Backend Logic

    In the backend service that handles the Whisper API requests, add logic to check for the 'word_timestamps' parameter in the incoming request. If it is set to true, pass this parameter to the Whisper model during transcription.

    python
    ```python
    if request.json.get('word_timestamps'):
        response = whisper.transcribe(audio, word_timestamps=True)
    else:
        response = whisper.transcribe(audio)
    ```
  3. 3

    Update API Documentation

    Revise the API documentation to include the new 'word_timestamps' parameter, detailing its usage and the expected output format when it is enabled.

    markdown
    ```markdown
    ### word_timestamps
    - Type: boolean
    - Default: false
    - Description: If true, returns word-level timestamps in the transcription response.
    ```
  4. 4

    Test the Implementation

    Create unit tests to verify that the API correctly processes the 'word_timestamps' parameter and that the output includes accurate timestamps when enabled.

    python
    ```python
    def test_word_timestamps():
        response = api_call_with_word_timestamps()
        assert 'timestamps' in response
        assert len(response['timestamps']) > 0
    ```
  5. 5

    Deploy Changes

    Once the implementation is tested and verified, deploy the changes to the production environment to make the 'word_timestamps' feature available to users.

    bash
    ```bash
    # Deploy command
    ./deploy.sh
    ```

Validation

To confirm the fix worked, make an API call with the 'word_timestamps' parameter set to true and verify that the response includes detailed word-level timestamps instead of rounded timestamps.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

openaigptllmapiopenai-api