Ephemeral (single use) runner registrations
Problem
Describe the bug When starting a self hosted runner with `./run.cmd --once`, the runner sometimes accepts a second job before shutting down, which causes that second job to fail with the message: [code block] This looks like the same issue recently fixed here: microsoft/azure-pipelines-agent#2728 To Reproduce Steps to reproduce the behavior: 1. Create a repo, enable GitHub Actions, and add a new workflow 1. Configure a new runner on your machine 1. Run the runner with `./run.cmd --once` 1. Queue two runs of your workflow 1. The first job will run and the runner will go offline 1. (Optionally) configure and start a second runner 1. The second job will time out after several minutes with the message: [code block] (where `[runner-name]` is the name of the first runner) 1. Also: trying to remove the first runner with the command `./config.cmd remove --token [token]` will result in the following error until the second job times out: [code block] Expected behavior The second job should run on (and wait for) any new runner that comes online rather than try to run as a second job on the, now offline, original runner. Runner Version and Platform 2.262.1 on Windows Runner and Worker's Diagnostic Logs _diag.zip
Error Output
error until the second job times out:
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Solution: Ephemeral (single use) runner registrations
We solved this issue by adding a timeout to our ephemeral workers. The timeout will deregister and kill the worker. [code block] Actions team: please support `--once` when you have a chance! We currently have ~16x parallelization on our CI to keep builds under 20 minutes. When moving to self-hosted, we had the following options: 1) Have an autoscaling group that keeps 16 instances around at th
Trust Score
6 verifications
- 1
We solved this issue by adding a timeout to our ephemeral workers. The timeout w
We solved this issue by adding a timeout to our ephemeral workers. The timeout will deregister and kill the worker.
- 2
Actions team: please support `--once` when you have a chance!
Actions team: please support `--once` when you have a chance!
- 3
We currently have ~16x parallelization on our CI to keep builds under 20 minutes
We currently have ~16x parallelization on our CI to keep builds under 20 minutes. When moving to self-hosted, we had the following options:
- 4
) Have an autoscaling group that keeps 16 instances around at the minimum and de
1) Have an autoscaling group that keeps 16 instances around at the minimum and deploy more at peak times (this is a huge waste of money, it's hard to scale out and we have no clue how to scale-in without affecting existing jobs).
Validation
Resolved in actions/runner GitHub issue #510. Community reactions: 16 upvotes.
Verification Summary
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep