FG
💻 Software☁️ Cloud & DevOpsMicrosoft

safe_sleep.sh rarely hangs indefinitely

Fresh4 months ago
Mar 14, 20260 views
Confidence Score78%
78%

Problem

Describe the bug Very rarely on update of github actions runner `safe_sleep.sh` hangs forever: [code block] I suspect it may happen sometimes if machine runs in cloud and is overloaded and/or overcommitted. To Reproduce Steps to reproduce the behavior: 1. Download runner of version prior to version, for example, 2.322. 2. Register and run runner. 3. Runner updates itself. It can also take a task to complete in meantime. Expected behavior Update should not hang infinitely. Runner Version and Platform Runner 2.322 OS: Linux

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
High Confidence Fix
76% confidence93% success rate12 verificationsLast verified Mar 14, 2026

Solution: safe_sleep.sh rarely hangs indefinitely

Low Risk

script in question The bug in this "safe sleep" script is obvious from looking at it: if the process is not scheduled for the one-second interval in which the loop would return (due to `$SECONDS` having the correct value), then it simply spins forever. That can easily happen on a CI machine under extreme load. When this happens, it's pretty bad: it completely breaks a runner until manual interven

76

Trust Score

12 verifications

93% success
  1. 1

    The bug in this "safe sleep" script is obvious from looking at it: if the proces

    The bug in this "safe sleep" script is obvious from looking at it: if the process is not scheduled for the one-second interval in which the loop would return (due to `$SECONDS` having the correct value), then it simply spins forever. That can easily happen on a CI machine under extreme load. When this happens, it's pretty bad: it completely breaks a runner until manual intervention. On Zig's CI runner machines, we observed multiple of these processes which had been running for hundreds of hours, silently taking down two runner services for weeks.

  2. 2

    I don't understand how we got here. Even ignoring the pretty clear bug, what mak

    I don't understand how we got here. Even ignoring the pretty clear bug, what makes this Bash script "safer" than calling into the POSIX standard `sleep` utility? It doesn't seem to solve any problem; meanwhile, it's less portable and needlessly eats CPU time by busy-waiting.

  3. 3

    The sloppy coding which is evident here, as well as the inaction on core Actions

    The sloppy coding which is evident here, as well as the inaction on core Actions bugs (in line with the decay in quality of almost every part of GitHub's product), is forcing the Zig project to strongly consider moving away from GitHub Actions entirely. With this bug, and many others (severe workflow scheduling issues resulting in dozens of timeouts; logs randomly becoming inaccessible; random job cancellations without details; perpetually "pending" jobs), we can no longer trust that Actions can be used to implement reliable CI infrastructure. I personally would seriously encourage other proje

Validation

Resolved in actions/runner GitHub issue #3792. Community reactions: 363 upvotes.

Verification Summary

Worked: 12
Partial: 2
Failed: 1
Last verified Mar 14, 2026

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

github-actionsci-cdrunnerbug