FG
☁️ Cloud & DevOpsAmazon

ci-kubernetes-e2e-gci-gce-examples: broken test run

Freshabout 21 hours ago
Mar 14, 20260 views
Confidence Score55%
55%

Problem

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-examples/3599/ Multiple broken tests: Failed: [k8s.io] [Feature:Example] [k8s.io] Hazelcast should create and scale hazelcast {Kubernetes e2e suite} [code block] Issues about this test specifically: #27850 #30672 #33271 Failed: [k8s.io] [Feature:Example] [k8s.io] CassandraStatefulSet should create statefulset {Kubernetes e2e suite} [code block] Issues about this test specifically: #36323 #36469 #38222 Failed: Test {e2e.go} [code block] Issues about this test specifically: #33361 #38663 #39788 #39877 #40371 #40469 #40478 #40483 #40668 #41048 #43025 Failed: [k8s.io] [Feature:Example] [k8s.io] Cassandra should create and scale cassandra {Kubernetes e2e suite} [code block] Issues about this test specifically: #27978 #28817 #39574 Previous issues for this suite: #36939 #39382 #39874 #42107 #43019

Error Output

error:
    <*errors.errorString | 0xc420fc4950>: {

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Fix Flaky E2E Tests for Hazelcast and Cassandra in Kubernetes

Medium Risk

The failures in the E2E tests for Hazelcast and Cassandra are likely due to resource constraints and timing issues in the Kubernetes environment. These tests may be sensitive to the state of the cluster and the availability of resources, leading to intermittent failures. Additionally, the tests may not be properly cleaning up resources after execution, causing conflicts in subsequent runs.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Increase Resource Limits for Test Pods

    Modify the resource limits for the test pods to ensure they have sufficient CPU and memory. This can help mitigate issues related to resource contention during test execution.

    yaml
    resources:
      limits:
        cpu: '1000m'
        memory: '1Gi'
      requests:
        cpu: '500m'
        memory: '512Mi'
  2. 2

    Implement Retry Logic in Tests

    Add retry logic to the tests to handle transient failures. This can help reduce the impact of flaky tests by allowing them to rerun upon failure.

    go
    retryCount := 3
    for i := 0; i < retryCount; i++ {
      err := runTest()
      if err == nil {
        break
      }
      time.Sleep(time.Second * time.Duration(i))
    }
  3. 3

    Ensure Proper Cleanup of Resources

    Review and update the test teardown procedures to ensure all resources are cleaned up after tests run. This will prevent conflicts in subsequent test executions.

    go
    defer cleanupResources()
    
    func cleanupResources() {
      // Code to delete created resources
    }
  4. 4

    Update Test Dependencies

    Check and update the dependencies for the Hazelcast and Cassandra tests to ensure compatibility with the latest Kubernetes version and to include any bug fixes related to E2E tests.

    bash
    go get k8s.io/kubernetes@latest
  5. 5

    Run Tests in a Dedicated Namespace

    Run the E2E tests in a dedicated Kubernetes namespace to isolate them from other workloads. This can help reduce interference and improve test reliability.

    bash
    kubectl create namespace e2e-tests
    kubectl apply -f test-deployment.yaml -n e2e-tests

Validation

After implementing the fixes, run the E2E tests again. Monitor the test results for any failures. A successful run with no failures across multiple executions will confirm that the issues have been resolved.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

kubernetesk8scontainerspriority/critical-urgentkind/flakeneeds-sig