Very slow queuing with plenty of idle runners available
Problem
Originally posted here: https://github.community/t/very-slow-queuing-behavior-when-idle-runners-are-available/127674 Describe the bug When no other builds are running (all my runners are idle), there is very delayed behavior from GH actions before builds even start. This is especially noticeable with, for example, a 4^4 matrix (256 checks), even if you have 256 idle self-hosted runners. The UI shows “X queued checks” at a rate of around 4 per second (ie, “4 queued checks”, “8 queued checks”, etc), before it finally gets to 256 checks queued. It takes a full 1min 40sec before the first of my runners even receives a message and starts building. It takes 12-13min for the entire run to be marked as finished, even if each build does no work and completes in 1sec or less. To Reproduce 1. Register (and run) 256 self-hosted runners 2. Run a workflow that uses a 4^4 matrix: [code block] 3. Observe that it takes a long time before the first build message is sent to a runner 4. Observe how, even after all checks are completed, it still takes many minutes (10?) for the entire workflow to be marked as finished You can also observe similar behavior with fewer checks – eg, even just 16 runners and a 4^2 matrix. Even then, checks are queued before the first build will start – and there's a noticeable delay after the 16th check has finished before the whole workflow is marked as complete. I see an overall run time of 1min10sec – even though each worker has completed its build in less tha
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Optimize GitHub Actions Queuing for Idle Runners
The observed delay in queuing and processing builds is likely due to the way GitHub Actions handles job scheduling and resource allocation. Even with idle self-hosted runners, the queuing mechanism may introduce latency as it processes multiple jobs in batches rather than immediately dispatching them to available runners. This can be exacerbated by the large number of jobs in a matrix build, leading to inefficient job handling and extended wait times.
Awaiting Verification
Be the first to verify this fix
- 1
Reduce Matrix Size
Consider breaking down the 4^4 matrix into smaller matrices to reduce the initial queuing time. For example, run a 4^2 matrix first and then a separate job for the remaining checks. This can help in managing the load and reducing the initial delay.
yamljobs: build_small: runs-on: self-hosted strategy: matrix: config: [1, 2, 3, 4] steps: - run: echo 'Running small matrix job' - 2
Increase Job Dispatch Rate
Modify the workflow to dispatch jobs more aggressively. You can utilize the 'concurrency' feature to allow more jobs to run simultaneously, which can help in reducing the queuing time.
yamlconcurrency: group: 'build' cancel-in-progress: false max-parallel: 256 - 3
Optimize Runner Configuration
Ensure that your self-hosted runners are optimally configured. Check the runner's resource allocation (CPU, memory) and ensure they are not limited by system constraints. Increasing the resources available to the runners may help in processing jobs faster.
bashsudo systemctl restart actions.runner.service - 4
Monitor Runner Performance
Implement monitoring on your self-hosted runners to track their performance and identify any bottlenecks. Use tools like Prometheus or Grafana to visualize metrics such as CPU usage, memory usage, and job processing times.
bashdocker run -d --name=prometheus -p 9090:9090 prom/prometheus - 5
Review GitHub Actions Settings
Check the GitHub Actions settings for your repository to ensure that there are no restrictions or limits set on job concurrency or runner usage that could be affecting performance.
noneNavigate to Settings > Actions > General and review the settings.
Validation
To confirm that the fix worked, monitor the queuing time and job dispatch times after implementing the changes. Ideally, the first job should start within a few seconds of triggering the workflow, and the overall workflow completion time should decrease significantly.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep