ci-kubernetes-e2e-kops-aws-serial: broken test run
Problem
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-kops-aws-serial/1/ Multiple broken tests: Failed: [k8s.io] Network Partition [Disruptive] [Slow] [k8s.io] [Job] should create new pods when node is partitioned {Kubernetes e2e suite} [code block] Issues about this test specifically: #36950 Failed: [k8s.io] Daemon set [Serial] Should update pod when spec was updated and update strategy is RollingUpdate {Kubernetes e2e suite} [code block] Failed: [k8s.io] Network Partition [Disruptive] [Slow] [k8s.io] [ReplicationController] should eagerly create replacement pod during network partition when termination grace is non-zero {Kubernetes e2e suite} [code block] Issues about this test specifically: #37479 Failed: [k8s.io] Namespaces [Serial] should ensure that all pods are removed when a namespace is deleted. {Kubernetes e2e suite} [code block] Failed: [k8s.io] Kubelet [Serial] [Slow] [k8s.io] regular resource usage tracking resource tracking for 0 pods per node {Kubernetes e2e suite} [code block] Issues about this test specifically: #26784 #28384 #31935 #33023 #39880 Failed: [k8s.io] Kubelet [Serial] [Slow] [k8s.io] regular resource usage tracking resource tracking for 100 pods per node {Kubernetes e2e suite} [code block] Issues about this test specifically: #26982 #32214 #33994 #34035 #35399 #38209 Failed: [k8s.io] Daemon set [Serial] Should not update pod when spec was updated and update strategy is on delete {Kubernetes e2e suite} [co
Error Output
error:
<*errors.errorString | 0xc4203d8250>: {Unverified for your environment
Select your OS to check compatibility.
1 Fix
Fix Kubernetes E2E Test Failures in Kops AWS Serial
The failures in the Kubernetes E2E tests are primarily due to issues with network partition handling, daemon set update strategies, and resource tracking in Kubelet. These can be attributed to race conditions, improper handling of state during network disruptions, and potential bugs in the implementation of the update strategies for daemon sets. Specific issues have been identified in the test cases that need to be addressed to ensure stability.
Awaiting Verification
Be the first to verify this fix
- 1
Review and Update Network Partition Test Logic
Investigate the logic used in the 'Network Partition' tests to ensure that the handling of pod creation during network partitions is robust. Update the test to include additional logging and error handling to capture more details during failures.
gofunc TestNetworkPartition(t *testing.T) { /* Updated logic here */ } - 2
Fix Daemon Set Update Strategy Tests
Review the implementation of the daemon set update strategy tests to ensure they accurately reflect the expected behavior of 'RollingUpdate' and 'OnDelete'. Modify the tests to ensure they correctly simulate the conditions for both update strategies.
godaemonSet.Spec.UpdateStrategy.Type = appsv1.RollingUpdate - 3
Enhance Resource Tracking Tests
Enhance the resource tracking tests for Kubelet to ensure they handle edge cases such as zero pods and maximum pod limits. This may involve adding additional assertions and modifying the test setup to simulate various resource usage scenarios.
goassert.Equal(t, expectedUsage, actualUsage) - 4
Run Tests in Isolated Environment
Execute the modified tests in an isolated environment to ensure that changes do not introduce new issues. Use a dedicated test cluster on AWS to replicate the conditions of the original failures.
yamlkubectl apply -f test-environment.yaml - 5
Monitor and Collect Logs
After running the tests, monitor the logs for any errors or unexpected behavior. Collect logs from both the test framework and the Kubernetes components to identify any remaining issues.
bashkubectl logs -l app=test-app
Validation
Confirm that all modified tests pass successfully without errors. Additionally, review logs for any warnings or errors that may indicate underlying issues. A successful run should not show any failures related to network partition, daemon set updates, or resource tracking.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep