Spec: phpboyscout/cicd v0.10.5 — goreleaser auto-retries transient release failures¶
- Repository:
gitlab.com/phpboyscout/cicd - Released as:
v0.10.5(patch — one new input with a behaviour-preserving default is additive, but the default value changes release-job behaviour on failure, so it ships as a fix). - Driver:
go-tool-base'sv0.17.0tag pipeline. Thegoreleaserjob fired automatically on the tag, but failed during macOS notarization:
sign & notarize macOS binaries
release failed: unable to add timestamps (RFC3161):
Post "http://timestamp.apple.com/ts01": dial tcp 17.32.213.161:80: i/o timeout
goreleaser fails the entire run on that error, so a single transient
network blip reaching Apple's timestamp server published zero release
assets for every platform. A manual re-run ~6.5h later (same commit, same
config) succeeded — confirming a pure transient. Self-updating consumers then
saw "unable to find asset for
Summary¶
Release jobs are long, expensive, and tag-triggered once per release; a
transient failure (a timestamp-server dial-timeout, a runner dropout, an image
pull blip) should not require a human to notice and click "retry". GitLab's
job-level retry is the right tool, but it is not currently wired into the
goreleaser component.
Add a retry_max input (default 2) and wire retry on the transient
failure classes:
goreleaser:
retry:
max: $[[ inputs.retry_max ]] # default 2 (GitLab caps at 2)
when:
- script_failure
- runner_system_failure
- stuck_or_timeout_failure
...
script_failure covers the notarization/timestamp timeout (goreleaser exits 1);
runner_system_failure and stuck_or_timeout_failure cover runner dropouts and
hung jobs. The retry is safe: the v0.17.0 failure occurred during signing,
before any release upload, and goreleaser's release.mode: keep-existing
(the documented mode for this component) makes a re-run idempotent — it attaches
to the existing Release and replaces artefacts rather than duplicating them.
Design¶
New input¶
| Input | Type | Default | Description |
|---|---|---|---|
retry_max |
number | 2 |
Automatic retries on a transient release failure (network / runner / timeout). GitLab caps this at 2; set 0 to disable. |
Behaviour¶
- Default (
retry_max: 2): a transient release failure auto-retries up to twice before the job is marked failed. A genuine, deterministic failure (bad.goreleaser.yaml, a real build error) will still ultimately fail — it just takes the extra attempts to surface. The cost (a few extra minutes on a real failure) is far smaller than a missed release. retry_max: 0: restores pre-v0.10.5 behaviour (no retry). The failure-path self-test sets this so it does not re-run the deliberately-failing job three times.
retry.when is fixed (not an input): the three transient classes above are the
only ones worth auto-retrying; retrying always would mask deterministic
failures, and retrying nothing else is the point.
Tests¶
tests/goreleaser/ deliberately runs goreleaser with no .goreleaser.yaml so
the job exits non-zero (tolerated via allow_failure.exit_codes). With the new
default that failure would retry twice (three runs) before being tolerated, so
the fixture passes retry_max: 0 to keep the self-test single-shot. This also
exercises the new input plumbing.
Consumer follow-up¶
go-tool-base carries an interim in-repo retry override on its goreleaser
job (the immediate stopgap raised the same day). Once it bumps its
gitlab.com/phpboyscout/cicd/goreleaser include to @v0.10.5, that override is
removed in favour of this component default.