FG
☁️ Cloud & DevOpsAmazon

[k8s.io] [Feature:Example] [k8s.io] Spark should start spark master, driver and workers {Kubernetes e2e suite}

Freshabout 21 hours ago
Mar 14, 20260 views
Confidence Score55%
55%

Problem

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gce-examples/17028 Failed: [k8s.io] [Feature:Example] [k8s.io] Spark should start spark master, driver and workers 6.17s [code block] This test has been consistently failing for a long time https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gce-examples/17027 https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gce-examples/17026 https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gce-examples/17024 https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gce-examples/17028

Error Output

error:
    <exec.CodeExitError>: {

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Fix Spark Master and Worker Startup in Kubernetes E2E Tests

Medium Risk

The Spark master, driver, and worker pods are failing to start due to misconfiguration in resource requests and limits, or due to insufficient permissions in the Kubernetes cluster. The logs indicate that the pods are not able to communicate properly or are being terminated due to resource constraints.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Update Spark Configuration

    Modify the Spark configuration to ensure that the resource requests and limits are set appropriately for the Kubernetes environment. This will help prevent the pods from being terminated due to resource constraints.

    bash
    spark-submit --master k8s://https://<K8S_API_SERVER> --conf spark.executor.instances=2 --conf spark.kubernetes.container.image=<SPARK_IMAGE> --conf spark.kubernetes.namespace=<NAMESPACE> --conf spark.kubernetes.executor.request.cores=1 --conf spark.kubernetes.executor.limit.cores=2
  2. 2

    Check Kubernetes Role and RoleBinding

    Ensure that the service account used by Spark has the necessary permissions to create and manage pods in the specified namespace. Create or update the Role and RoleBinding if necessary.

    bash
    kubectl create role spark-role --verb=get,list,watch,create,update,delete --resource=pods --namespace=<NAMESPACE>
    kubectl create rolebinding spark-role-binding --role=spark-role --serviceaccount=<NAMESPACE>:<SERVICE_ACCOUNT> --namespace=<NAMESPACE>
  3. 3

    Increase Cluster Resources

    If the cluster is running out of resources, consider increasing the available CPU and memory resources to accommodate the Spark pods. This can be done by resizing the nodes or adding more nodes to the cluster.

    bash
    gcloud container clusters resize <CLUSTER_NAME> --node-pool <NODE_POOL_NAME> --num-nodes <NEW_NODE_COUNT>
  4. 4

    Run E2E Tests

    After making the above changes, rerun the Kubernetes E2E tests to verify that the Spark master, driver, and worker pods start successfully without errors.

    bash
    kubectl apply -f <SPARK_DEPLOYMENT_YAML>
    kubectl get pods --namespace=<NAMESPACE>

Validation

Confirm that the Spark master, driver, and worker pods are running successfully by checking their status using 'kubectl get pods --namespace=<NAMESPACE>'. Additionally, review the logs of the pods to ensure there are no errors related to resource allocation or permissions.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

kubernetesk8scontainerspriority/critical-urgentarea/examplekind/flake