Pass word_level_timestamps option into whisper API call
Problem
Describe the feature or improvement you're requesting When I use the Whisper model directly via the openai/whisper python package (https://github.com/openai/whisper) I am able to pass the `word_timestamps` into the transcription call. See the below snippet: [code block] However, when I make an API call, I'm not able to do this. I find this a very useful feature, as by default, the timestamps in the response are rounded to the nearest second, whereas when passing in `word_timestamps=True` they're much more accurate. Could the `word_timestamps` parameter be added to the API? Additional context _No response_
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Add word_timestamps Parameter to Whisper API Call
The current Whisper API implementation does not expose the 'word_timestamps' parameter, which is available in the openai/whisper Python package. This results in less accurate timestamps in the transcription response, as they are rounded to the nearest second instead of providing precise word-level timestamps.
Awaiting Verification
Be the first to verify this fix
- 1
Modify API Specification
Update the API specification to include the 'word_timestamps' parameter in the request body. This will allow users to specify whether they want word-level timestamps in their transcription results.
json```json { "model": "whisper", "audio": "<audio_file>", "word_timestamps": true } ``` - 2
Implement Backend Logic
In the backend service that handles the Whisper API requests, add logic to check for the 'word_timestamps' parameter in the incoming request. If it is set to true, pass this parameter to the Whisper model during transcription.
python```python if request.json.get('word_timestamps'): response = whisper.transcribe(audio, word_timestamps=True) else: response = whisper.transcribe(audio) ``` - 3
Update API Documentation
Revise the API documentation to include the new 'word_timestamps' parameter, detailing its usage and the expected output format when it is enabled.
markdown```markdown ### word_timestamps - Type: boolean - Default: false - Description: If true, returns word-level timestamps in the transcription response. ``` - 4
Test the Implementation
Create unit tests to verify that the API correctly processes the 'word_timestamps' parameter and that the output includes accurate timestamps when enabled.
python```python def test_word_timestamps(): response = api_call_with_word_timestamps() assert 'timestamps' in response assert len(response['timestamps']) > 0 ``` - 5
Deploy Changes
Once the implementation is tested and verified, deploy the changes to the production environment to make the 'word_timestamps' feature available to users.
bash```bash # Deploy command ./deploy.sh ```
Validation
To confirm the fix worked, make an API call with the 'word_timestamps' parameter set to true and verify that the response includes detailed word-level timestamps instead of rounded timestamps.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep