FG
๐Ÿ—„๏ธ Databases

Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache

Freshabout 21 hours ago
Mar 14, 20260 views
Confidence Score53%
53%

Problem

Error description Hello, we are running ioredis 4.14.0 against an AWS ElastiCache replication group with cluster mode on (12 nodes, 3 shard) and using the cluster configuration address to connect to the cluster. this._redisClient = new Redis.Cluster([`//${host}:${port}`], { maxRedirections: process.env.MAX_REDIRECTIONS, redisOptions: { reconnectOnError: function (err) { const targetError = 'READONLY'; if (err.message.slice(0, targetError.length) === targetError) { // Only reconnect when the error starts with "READONLY" return 2; } return false; }, }, }); }` The log of error that we are getting reads by: `@message 2019-10-30T00:00:06.045Z 2970a6b4-2f41-468a-96f7-96649d690fbc [ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache. at tryNode (/opt/nodejs/node_modules/ioredis/built/cluster/index.js:359:31) at /opt/nodejs/node_modules/ioredis/built/cluster/index.js:376:21 at Timeout.duplicatedConnection.cluster.utils_2.timeout (/opt/nodejs/node_modules/ioredis/built/cluster/index.js:623:24) at Timeout.run (/opt/nodejs/node_modules/ioredis/built/utils/index.js:156:22) at Timeout.<anonymous> (/opt/nodejs/node_modules/async-listener/glue.js:188:31) at ontimeout (timers.js:486:15) at tryOnTimeout (timers.js:317:5) at Timer.listOnTimeout (timers.js:277:5)` Error scenario We have an AWS Lambda function that connects to said cluster and is instantiated frequently. Sometimes the request is attende

Error Output

Error: function (err) {

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix โ€“ Awaiting Verification

Implement Error Handling and Connection Retry Logic for Redis Cluster

Medium Risk

The error 'ClusterAllFailedError: Failed to refresh slots cache' occurs when the ioredis client cannot connect to any of the nodes in the Redis cluster. This can happen due to network issues, node failures, or misconfiguration. The Lambda function's frequent instantiation may lead to connection attempts that exceed the cluster's capacity or timeout settings.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Increase Connection Timeout

    Adjust the connection timeout settings to allow more time for establishing connections to the Redis cluster, especially under high load or during cold starts in AWS Lambda.

    javascript
    const redisClient = new Redis.Cluster([`//${host}:${port}`], { connectTimeout: 10000 });
  2. 2

    Implement Enhanced Error Handling

    Add a listener for the 'error' event on the Redis client to handle errors gracefully and prevent unhandled exceptions. This will allow you to log errors and take appropriate actions.

    javascript
    redisClient.on('error', (err) => { console.error('Redis Client Error:', err); });
  3. 3

    Retry Logic for Connection Failures

    Implement a retry mechanism that attempts to reconnect to the Redis cluster a specified number of times before failing. This can help mitigate transient network issues.

    javascript
    let retryCount = 0; const maxRetries = 5; const connectWithRetry = () => { redisClient.connect().catch((err) => { if (retryCount < maxRetries) { retryCount++; console.log('Retrying connection to Redis...'); setTimeout(connectWithRetry, 2000); } else { console.error('Max retries reached. Could not connect to Redis:', err); } }); }; connectWithRetry();
  4. 4

    Monitor and Adjust Redis Cluster Configuration

    Review the Redis cluster configuration in AWS ElastiCache to ensure that it is properly set up for high availability and can handle the expected load from the Lambda function. Consider scaling the cluster if necessary.

    yaml
    AWS Management Console > ElastiCache > Redis > Cluster Configuration

Validation

To confirm the fix worked, monitor the application logs for any occurrences of 'ClusterAllFailedError' after implementing the changes. Additionally, check the connection stability and response times from the Redis cluster during peak loads.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

redisiorediscache