Spec: phpboyscout/cicd v0.2 — tofu-plan / tofu-apply¶
- Repository:
gitlab.com/phpboyscout/cicd - Released as:
v0.2.0(minor — two new components, no change to the four v0.1 components' input shape). - First consumer:
phpboyscout/infra— thesrc/security-baseline/stack, applied via GitLab CI (GitLab migration spec Phase E).
Summary¶
v0.1 of phpboyscout/cicd shipped four gate components (lint,
security, validate, pages) — none of which touch AWS. v0.2 adds the
two components that actually drive infrastructure:
tofu-plan— runstofu planagainst a real AWS account and a real (GitLab-managed) state backend. Produces a reviewable plan artifact + a GitLab MR plan-widget report. Runs on branches / MRs.tofu-apply— consumes the plan artifact and runstofu apply. Manual-gated by default; runs on the default branch.
Both authenticate to AWS with no static credentials: GitLab CI
mints an OIDC ID token, AWS's AssumeRoleWithWebIdentity exchanges it
for short-lived credentials. This is the GitLab-side mirror of what
GitHub Actions OIDC did before the migration, and the reason Phase D
provisioned the gitlab.com OIDC IDP + phpboyscout-automation role
in the AWS account.
Motivation¶
Phase D of the GitLab migration cut infra/bootstrap over to GitLab
CI OIDC and moved its state to GitLab. But the gate components can't
verify the OIDC chain — they run tofu validate -backend=false, which
needs neither AWS nor the real state. The migration's whole point —
applying infrastructure (security-baseline, future workload stacks)
from GitLab CI — needs components that:
- Obtain AWS credentials via OIDC (no long-lived secrets in CI).
- Talk to the GitLab-managed HTTP state backend.
- Run
plan/applywith the safety rails infrastructure demands (reviewable plan, manual apply gate, plan/apply consistency).
infra will accrue more stacks (src/<workload>/, modules/); each
needs the same plan/apply flow. A reusable pair of components is the
same call we made for the gate components — author once, version,
Renovate-bump consumers.
Decisions¶
D1 — Two components, not one mode-switched component¶
tofu-plan and tofu-apply are separate templates. Rejected a single
tofu component with a mode: plan|apply input because:
- The two have different
rules:defaults (plan on branches/MRs; apply on the default branch, manual). tofu-applyneeds:thetofu-planjob's artifact — a dependency a single component can't cleanly express on itself.- Separate templates make the consumer's
.gitlab-ci.ymlread as what it is: a plan stage and an apply stage.
D2 — AWS auth: OIDC ID token → AssumeRoleWithWebIdentity, via env vars¶
GitLab CI mints an ID token through the id_tokens: keyword:
The component writes that token to a file and exports the two variables the AWS SDK's web-identity credential provider looks for:
echo "$AWS_OIDC_TOKEN" > "${CI_PROJECT_DIR}/.aws-oidc-token"
export AWS_ROLE_ARN="$[[ inputs.role_arn ]]"
export AWS_WEB_IDENTITY_TOKEN_FILE="${CI_PROJECT_DIR}/.aws-oidc-token"
export AWS_REGION="$[[ inputs.aws_region ]]"
OpenTofu's aws provider uses the standard AWS SDK credential chain;
with those three set it performs the AssumeRoleWithWebIdentity
exchange itself. No explicit aws sts call in the component — the
provider does it, refreshes it, and the credentials are never written
to disk beyond the short-lived ID token.
Why audience sts.amazonaws.com: that's what the
terraform-aws-bootstrap v0.2 GitLab path bakes into the IAM OIDC
provider's client_id_list and the role trust policy's aud
condition. The component's aud input defaults to it; an override
exists for accounts that pinned a different audience.
D3 — Consumer requirement: provider must NOT hardcode an AWS profile¶
The web-identity credential chain only kicks in if nothing higher
priority short-circuits it. A stack whose providers.tf says
profile = "tofu-bootstrap" will ignore the env vars and fail in CI
(no ~/.aws/credentials on the runner).
Consumers of tofu-plan / tofu-apply must declare the aws
provider with no static profile — let the credential chain
resolve. The recommended pattern:
variable "aws_profile" {
description = "Local AWS CLI profile. Leave null in CI — the OIDC web-identity credential chain is used instead."
type = string
default = null
}
provider "aws" {
region = var.region
profile = var.aws_profile # null in CI
allowed_account_ids = [var.account_id]
}
This is a consumer-side concern, documented here and enforced by the component failing loudly (the AWS provider errors with a clear "no valid credential sources" message if a profile is wrongly pinned).
D4 — State backend auth: gitlab-ci-token + $CI_JOB_TOKEN¶
Consumer stacks store state in the GitLab-managed HTTP backend (one
state object per stack, per the migration spec). tofu init against
that backend needs HTTP basic-auth. In CI the component exports:
CI_JOB_TOKEN carries the terraform_state permission for the job's
own project by default — no PAT, no project access token needed for
same-project state. Cross-project state would need a different
credential; out of scope for v0.2 (every phpboyscout stack stores
state in its own project).
D5 — Plan artifact hand-off + MR plan widget¶
tofu-plan runs tofu plan -out=tfplan.cache and saves tfplan.cache
as a job artifact. tofu-apply declares needs: [tofu-plan-job] with
artifacts: true and runs tofu apply tfplan.cache — applying the
exact plan that was reviewed. If state moved between plan and
apply, tofu apply rejects the stale plan (correct, fail-safe).
tofu-plan additionally emits tofu show -json tfplan.cache >
tfplan.json and publishes it as a reports: terraform: artifact, so
GitLab renders an add/change/destroy summary in the MR widget.
Sensitive values: a binary plan can embed sensitive attribute
values. Artifacts on a private project are acceptable exposure; the
artifact expire_in is short (1 day). Consumers on public projects
should not use tofu-apply's artifact hand-off with sensitive state —
documented as a caveat.
D6 — tofu-apply is manual-gated by default¶
tofu-apply's default rules: run the job only on the default branch
and as when: manual — a human clicks "apply" in the GitLab UI
after reviewing the plan. Infrastructure apply is not something to
trigger automatically on merge.
A manual boolean input (default true) lets a consumer opt into
auto-apply-on-merge if they have a reason; we don't, and the default
protects against accidental applies.
D7 — Inputs surface¶
tofu-plan:
| Input | Type | Default | Notes |
|---|---|---|---|
image_version |
string | "v0.2.0" |
infra-tools tag |
stage |
string | "plan" |
consumer's stage layout |
working_directory |
string | "." |
the stack directory |
role_arn |
string | — (required) | AWS role to assume |
aws_region |
string | "eu-west-2" |
|
aud |
string | "sts.amazonaws.com" |
OIDC token audience |
var_file |
string | "" |
optional -var-file path, relative to working_directory |
tofu-apply: all of the above, plus:
| Input | Type | Default | Notes |
|---|---|---|---|
manual |
boolean | true |
whether apply is a manual-gated job |
plan_job |
string | "tofu-plan" |
the job name to pull the plan artifact from |
D8 — Versioning¶
Adding components is a minor bump → cicd v0.2.0. The four v0.1
components are unchanged; consumers on @v0.1.x are unaffected until
they bump. Pre-1.0 caveat from v0.1 still holds.
Open questions¶
- OQ1 — Single combined role vs plan/apply split. Phase D
provisioned one role (
phpboyscout-automation) withAdministrator Access. The components take arole_arninput, so a future plan/apply role split (read-only plan role, write apply role) is a consumer-side change — callterraform-aws-bootstrap'sautomation-iamtwice — not a component change. Tentative: ship v0.2 single-role; revisit role split as a separate piece of work. - OQ2 —
tofu-applyre-plan vs artifact apply. v0.2 uses the artifact hand-off (apply the reviewed plan). An alternative is apply re-plans from scratch. Artifact apply is safer (no drift window) and is the GitLab-documented pattern. Tentative: artifact hand-off, as in D5. - OQ3 — GitLab
environment:integration. GitLab can track deployments perenvironment:. v0.2 doesn't wire this; a v0.2.x follow-on could add anenvironmentinput so applies show up in the GitLab environments/deployments UI. Tentative: defer.
Component catalogue¶
tofu-plan¶
spec:
component: [version]
inputs:
image_version: { type: string, default: "v0.2.0" }
stage: { type: string, default: plan }
working_directory: { type: string, default: "." }
role_arn: { type: string }
aws_region: { type: string, default: "eu-west-2" }
aud: { type: string, default: "sts.amazonaws.com" }
var_file: { type: string, default: "" }
---
tofu-plan:
stage: $[[ inputs.stage ]]
image: registry.gitlab.com/phpboyscout/images/infra-tools:$[[ inputs.image_version ]]
id_tokens:
AWS_OIDC_TOKEN:
aud: $[[ inputs.aud ]]
variables:
TF_HTTP_USERNAME: gitlab-ci-token
TF_HTTP_PASSWORD: $CI_JOB_TOKEN
script:
- |
echo "tofu-plan $[[ component.version ]] (image $[[ inputs.image_version ]])"
echo "$AWS_OIDC_TOKEN" > "$CI_PROJECT_DIR/.aws-oidc-token"
export AWS_ROLE_ARN="$[[ inputs.role_arn ]]"
export AWS_WEB_IDENTITY_TOKEN_FILE="$CI_PROJECT_DIR/.aws-oidc-token"
export AWS_REGION="$[[ inputs.aws_region ]]"
cd "$[[ inputs.working_directory ]]"
tofu init -input=false
VARFILE_ARG=""
[ -n "$[[ inputs.var_file ]]" ] && VARFILE_ARG="-var-file=$[[ inputs.var_file ]]"
tofu plan -input=false -out=tfplan.cache $VARFILE_ARG
tofu show -json tfplan.cache > tfplan.json
artifacts:
paths:
- $[[ inputs.working_directory ]]/tfplan.cache
reports:
terraform: $[[ inputs.working_directory ]]/tfplan.json
expire_in: 1 day
tofu-apply¶
spec:
component: [version]
inputs:
image_version: { type: string, default: "v0.2.0" }
stage: { type: string, default: apply }
working_directory: { type: string, default: "." }
role_arn: { type: string }
aws_region: { type: string, default: "eu-west-2" }
aud: { type: string, default: "sts.amazonaws.com" }
manual: { type: boolean, default: true }
plan_job: { type: string, default: "tofu-plan" }
---
tofu-apply:
stage: $[[ inputs.stage ]]
image: registry.gitlab.com/phpboyscout/images/infra-tools:$[[ inputs.image_version ]]
id_tokens:
AWS_OIDC_TOKEN:
aud: $[[ inputs.aud ]]
variables:
TF_HTTP_USERNAME: gitlab-ci-token
TF_HTTP_PASSWORD: $CI_JOB_TOKEN
needs:
- job: $[[ inputs.plan_job ]]
artifacts: true
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
when: $[[ inputs.manual ]] && "manual" || "on_success"
script:
- |
echo "tofu-apply $[[ component.version ]] (image $[[ inputs.image_version ]])"
echo "$AWS_OIDC_TOKEN" > "$CI_PROJECT_DIR/.aws-oidc-token"
export AWS_ROLE_ARN="$[[ inputs.role_arn ]]"
export AWS_WEB_IDENTITY_TOKEN_FILE="$CI_PROJECT_DIR/.aws-oidc-token"
export AWS_REGION="$[[ inputs.aws_region ]]"
cd "$[[ inputs.working_directory ]]"
tofu init -input=false
tofu apply -input=false tfplan.cache
The
when:ternary intofu-apply's rule is illustrative — GitLab input interpolation inwhen:needs verifying during implementation (OQ for the build: GitLab may require the rule split into two entries gated on the boolean rather than an inline ternary).
Risk register¶
| Risk | Mitigation |
|---|---|
| OIDC token audience mismatch — role assume fails | aud input defaults to the value terraform-aws-bootstrap v0.2 bakes into the IAM provider. Self-test fixture exercises the full chain against a real (throwaway) role before tagging. |
| Plan artifact embeds sensitive values | Private-project artifact, short expire_in. Documented caveat for public-project consumers. |
| Stale plan applied after state drift | tofu apply tfplan.cache rejects a plan whose state serial no longer matches — fail-safe by design. |
CI_JOB_TOKEN lacks terraform_state permission |
Default GitLab project settings grant it; if a project disabled it, tofu init fails loudly at the backend step. Documented prerequisite. |
Consumer pins an AWS profile and the job ignores OIDC creds |
D3 documents the requirement; the AWS provider errors clearly. Self-test fixture's provider is profile-free. |
when: input interpolation unsupported |
Flagged inline above; implementation verifies and falls back to two rules: entries if needed. |
Implementation plan¶
- Spec lands — this file, status
approved. templates/tofu-plan.yml+templates/tofu-apply.ymlper the catalogue.- Self-test —
tests/tofu-plan/+tests/tofu-apply/fixtures. Resolved (option b): the fixture stack uses noawsprovider — just aterraform_dataresource (built-intofuprovider) with a variable + output. The fixture declares abackend "http"pointing at thephpboyscout/cicdproject's own GitLab-managed state (a dedicated state name per component:selftest-plan/selftest-apply) so the self-test exercises: - the
id_tokens:token mint, - the component's token-file + env-var wiring,
TF_HTTP_*auth against the GitLab state backend (CI_JOB_TOKEN),tofu init/plan/applyand the plan→apply artifact handoff.
It deliberately does not prove the AWS
AssumeRoleWithWebIdentity exchange — no AWS API call is made
because the fixture has no aws provider. role_arn is passed a
dummy value (arn:aws:iam::000000000000:role/selftest-noop);
it's exported as AWS_ROLE_ARN but never used. The real
end-to-end AWS-auth proof is Phase E's first tofu plan of
infra/src/security-baseline/. The tofu-apply self-test sets
manual: false so the apply runs automatically; applying the
fixture is a no-op terraform_data write, no cloud resources.
4. Root .gitlab-ci.yml gains the two new self-test triggers.
5. CHANGELOG [0.2.0], merge develop → main, tag v0.2.0.
6. Phase E proper — infra/src/security-baseline/ consumes
tofu-plan + tofu-apply (separate task).
Follow-ups¶
environment:integration (OQ3) — surface applies in GitLab's deployment UI.- plan/apply role split (OQ1) — separate read-only plan role, consumer-side.
tofu-destroycomponent — eventually, for tearing down ephemeral stacks; deliberately omitted from v0.2 (destroy is rare and high-blast-radius; manualtofu destroyis fine until a real need appears).