FG
☁️ Cloud & DevOpsAmazon

ci-kubernetes-e2e-kops-aws-serial: broken test run

Freshabout 21 hours ago
Mar 14, 20260 views
Confidence Score55%
55%

Problem

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-kops-aws-serial/1/ Multiple broken tests: Failed: [k8s.io] Network Partition [Disruptive] [Slow] [k8s.io] [Job] should create new pods when node is partitioned {Kubernetes e2e suite} [code block] Issues about this test specifically: #36950 Failed: [k8s.io] Daemon set [Serial] Should update pod when spec was updated and update strategy is RollingUpdate {Kubernetes e2e suite} [code block] Failed: [k8s.io] Network Partition [Disruptive] [Slow] [k8s.io] [ReplicationController] should eagerly create replacement pod during network partition when termination grace is non-zero {Kubernetes e2e suite} [code block] Issues about this test specifically: #37479 Failed: [k8s.io] Namespaces [Serial] should ensure that all pods are removed when a namespace is deleted. {Kubernetes e2e suite} [code block] Failed: [k8s.io] Kubelet [Serial] [Slow] [k8s.io] regular resource usage tracking resource tracking for 0 pods per node {Kubernetes e2e suite} [code block] Issues about this test specifically: #26784 #28384 #31935 #33023 #39880 Failed: [k8s.io] Kubelet [Serial] [Slow] [k8s.io] regular resource usage tracking resource tracking for 100 pods per node {Kubernetes e2e suite} [code block] Issues about this test specifically: #26982 #32214 #33994 #34035 #35399 #38209 Failed: [k8s.io] Daemon set [Serial] Should not update pod when spec was updated and update strategy is on delete {Kubernetes e2e suite} [co

Error Output

error:
    <*errors.errorString | 0xc4203d8250>: {

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Fix Kubernetes E2E Test Failures in Kops AWS Serial

Medium Risk

The failures in the Kubernetes E2E tests are primarily due to issues with network partition handling, daemon set update strategies, and resource tracking in Kubelet. These can be attributed to race conditions, improper handling of state during network disruptions, and potential bugs in the implementation of the update strategies for daemon sets. Specific issues have been identified in the test cases that need to be addressed to ensure stability.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Review and Update Network Partition Test Logic

    Investigate the logic used in the 'Network Partition' tests to ensure that the handling of pod creation during network partitions is robust. Update the test to include additional logging and error handling to capture more details during failures.

    go
    func TestNetworkPartition(t *testing.T) { /* Updated logic here */ }
  2. 2

    Fix Daemon Set Update Strategy Tests

    Review the implementation of the daemon set update strategy tests to ensure they accurately reflect the expected behavior of 'RollingUpdate' and 'OnDelete'. Modify the tests to ensure they correctly simulate the conditions for both update strategies.

    go
    daemonSet.Spec.UpdateStrategy.Type = appsv1.RollingUpdate
  3. 3

    Enhance Resource Tracking Tests

    Enhance the resource tracking tests for Kubelet to ensure they handle edge cases such as zero pods and maximum pod limits. This may involve adding additional assertions and modifying the test setup to simulate various resource usage scenarios.

    go
    assert.Equal(t, expectedUsage, actualUsage)
  4. 4

    Run Tests in Isolated Environment

    Execute the modified tests in an isolated environment to ensure that changes do not introduce new issues. Use a dedicated test cluster on AWS to replicate the conditions of the original failures.

    yaml
    kubectl apply -f test-environment.yaml
  5. 5

    Monitor and Collect Logs

    After running the tests, monitor the logs for any errors or unexpected behavior. Collect logs from both the test framework and the Kubernetes components to identify any remaining issues.

    bash
    kubectl logs -l app=test-app

Validation

Confirm that all modified tests pass successfully without errors. Additionally, review logs for any warnings or errors that may indicate underlying issues. A successful run should not show any failures related to network partition, daemon set updates, or resource tracking.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

kubernetesk8scontainerskind/flake