BZPOPMIN will hang for ever when a short disconnection happens and the port is different from 6379
Problem
This one was pretty weird but quite serious. If you issue a BZPOPMIN command, and there is a small disconnection while the command is blocking, then it will stay blocking forever, effectively hanging the app that uses the command. Now, one of the insane things about this issue is that to reproduce it you must use a port different than the standard 6379, and also use docker network disconnect, it is not enough to just stop the docker container running Redis: Try this code: bug.mjs: [code block] docker-compose.yml: [code block] Start the docker container with: [code block] Run the test with: [code block] Results in: [code block] Now in a different terminal, and depending on where you run the docker (I had my docker compose on directory ioredis-econnreset: [code block] The Redis network was disconnected and connected again after 3 seconds. Open a Redis cli: [code block] The terminal with the node app will stay the same, with no error or nothing. If you restart the app and without disconnecting and reconnecting you try again to ZADD to the key you will get instead: [code block] Let me know if you need more information.
Error Output
error or nothing. If you restart the app and without disconnecting and reconnecting you try again to ZADD to the key you will get instead:
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Implement Timeout for BZPOPMIN Command to Prevent Hanging
The BZPOPMIN command in Redis is designed to block until an element is available. However, if a disconnection occurs while the command is blocking, it can lead to a situation where the command never completes, especially when using a non-default port. This is likely due to the way Redis handles blocking commands and network interruptions, which can leave the command in a waiting state indefinitely.
Awaiting Verification
Be the first to verify this fix
- 1
Modify the Redis Command Implementation
Wrap the BZPOPMIN command in a timeout mechanism to ensure it does not hang indefinitely. This can be done by using a promise with a timeout that rejects if the command does not complete within a specified duration.
javascriptconst timeout = (ms) => new Promise((_, reject) => setTimeout(() => reject(new Error('BZPOPMIN timed out')), ms)); async function safeBZPOPMIN(redisClient, key, timeoutDuration) { return Promise.race([ redisClient.bzpopmin(key), timeout(timeoutDuration) ]); } - 2
Update Application Logic to Use Safe Command
Replace instances of the BZPOPMIN command in your application with the newly created safeBZPOPMIN function. Ensure that the timeout duration is set appropriately based on your application's requirements.
javascriptconst result = await safeBZPOPMIN(redisClient, 'mySortedSet', 5000); // 5 seconds timeout - 3
Test the Implementation
Run your application and simulate a network disconnection while the BZPOPMIN command is executing. Verify that the application does not hang and that the timeout error is handled gracefully.
bashdocker network disconnect <network_name> <container_name> - 4
Monitor Application Behavior
After implementing the timeout, monitor the application for any instances of the timeout error. Ensure that the application can recover from the error and continue functioning as expected.
javascriptconsole.log('Handling timeout error gracefully');
Validation
Confirm that the application no longer hangs when a disconnection occurs during the BZPOPMIN command. Check logs for timeout errors and ensure that the application can continue processing after handling the error.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep