Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache
Problem
Error description Hello, we are running ioredis 4.14.0 against an AWS ElastiCache replication group with cluster mode on (12 nodes, 3 shard) and using the cluster configuration address to connect to the cluster. this._redisClient = new Redis.Cluster([`//${host}:${port}`], { maxRedirections: process.env.MAX_REDIRECTIONS, redisOptions: { reconnectOnError: function (err) { const targetError = 'READONLY'; if (err.message.slice(0, targetError.length) === targetError) { // Only reconnect when the error starts with "READONLY" return 2; } return false; }, }, }); }` The log of error that we are getting reads by: `@message 2019-10-30T00:00:06.045Z 2970a6b4-2f41-468a-96f7-96649d690fbc [ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache. at tryNode (/opt/nodejs/node_modules/ioredis/built/cluster/index.js:359:31) at /opt/nodejs/node_modules/ioredis/built/cluster/index.js:376:21 at Timeout.duplicatedConnection.cluster.utils_2.timeout (/opt/nodejs/node_modules/ioredis/built/cluster/index.js:623:24) at Timeout.run (/opt/nodejs/node_modules/ioredis/built/utils/index.js:156:22) at Timeout.<anonymous> (/opt/nodejs/node_modules/async-listener/glue.js:188:31) at ontimeout (timers.js:486:15) at tryOnTimeout (timers.js:317:5) at Timer.listOnTimeout (timers.js:277:5)` Error scenario We have an AWS Lambda function that connects to said cluster and is instantiated frequently. Sometimes the request is attende
Error Output
Error: function (err) {
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Implement Error Handling and Connection Retry Logic for Redis Cluster
The error 'ClusterAllFailedError: Failed to refresh slots cache' occurs when the ioredis client cannot connect to any of the nodes in the Redis cluster. This can happen due to network issues, node failures, or misconfiguration. The Lambda function's frequent instantiation may lead to connection attempts that exceed the cluster's capacity or timeout settings.
Awaiting Verification
Be the first to verify this fix
- 1
Increase Connection Timeout
Adjust the connection timeout settings to allow more time for establishing connections to the Redis cluster, especially under high load or during cold starts in AWS Lambda.
javascriptconst redisClient = new Redis.Cluster([`//${host}:${port}`], { connectTimeout: 10000 }); - 2
Implement Enhanced Error Handling
Add a listener for the 'error' event on the Redis client to handle errors gracefully and prevent unhandled exceptions. This will allow you to log errors and take appropriate actions.
javascriptredisClient.on('error', (err) => { console.error('Redis Client Error:', err); }); - 3
Retry Logic for Connection Failures
Implement a retry mechanism that attempts to reconnect to the Redis cluster a specified number of times before failing. This can help mitigate transient network issues.
javascriptlet retryCount = 0; const maxRetries = 5; const connectWithRetry = () => { redisClient.connect().catch((err) => { if (retryCount < maxRetries) { retryCount++; console.log('Retrying connection to Redis...'); setTimeout(connectWithRetry, 2000); } else { console.error('Max retries reached. Could not connect to Redis:', err); } }); }; connectWithRetry(); - 4
Monitor and Adjust Redis Cluster Configuration
Review the Redis cluster configuration in AWS ElastiCache to ensure that it is properly set up for high availability and can handle the expected load from the Lambda function. Consider scaling the cluster if necessary.
yamlAWS Management Console > ElastiCache > Redis > Cluster Configuration
Validation
To confirm the fix worked, monitor the application logs for any occurrences of 'ClusterAllFailedError' after implementing the changes. Additionally, check the connection stability and response times from the Redis cluster during peak loads.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep