Terraform plan fails on PR due to missing state lock in CI/CD pipeline

Looking for advice on configuring a secure and reliable CI/CD pipeline for Genesys Cloud CX as Code using Terraform. We are attempting to implement a workflow where terraform plan runs automatically on every Pull Request to validate infrastructure changes, and terraform apply executes only upon merge to the main branch. The challenge lies in managing the state file and backend locking in a way that prevents race conditions between multiple concurrent PRs while ensuring the CI runner has the necessary OAuth tokens to authenticate with the Genesys Cloud API.

Background

Our infrastructure is defined using the Genesys Cloud Terraform Provider. We are using GitHub Actions for our CI/CD pipeline. The goal is to catch drift and configuration errors early in the development process without requiring manual intervention for every change. The team is located in Africa/Lagos, so we need to ensure the pipeline is robust against network latency and potential timeout issues with the Genesys Cloud API endpoints.

Issue

When multiple developers open PRs simultaneously, the terraform plan steps fail with a 409 Conflict error or a state lock acquisition failure. The error message indicates that the state file is locked by another process, but we are not using a remote backend with locking capabilities configured correctly. Additionally, we are seeing intermittent 401 Unauthorized errors during the plan phase, suggesting that the OAuth token generated for the CI runner might be expiring or not being refreshed correctly between the authentication step and the Terraform execution.

Troubleshooting

I have attempted to configure the Terraform backend to use S3 with DynamoDB for state locking, but the GitHub Actions runner lacks the necessary IAM permissions. I have also tried using environment variables for the OAuth token, but the token lifecycle management is proving difficult. Here is a snippet of our current GitHub Actions workflow:

name: Terraform Plan
on: pull_request
jobs:
 plan:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/checkout@v3
 - name: Setup Terraform
 uses: hashicorp/setup-terraform@v2
 - name: Terraform Init
 run: terraform init
 - name: Terraform Plan
 run: terraform plan -out=tfplan
 env:
 GC_CLIENT_ID: ${{ secrets.GC_CLIENT_ID }}
 GC_CLIENT_SECRET: ${{ secrets.GC_CLIENT_SECRET }}

How can we properly configure the backend and token management to support concurrent PRs and reliable apply on merge?

How I usually solve this is by configuring a remote backend in the terraform block to handle state locking automatically, rather than relying on local state files which cause race conditions in ci/cd pipelines. since i manage cx as code deployments via docker compose locally, i ensure my ci environment mirrors this by using s3 (or azure blob) with dynamodb for locking.

here is the backend configuration i use in my main.tf:

terraform {
 backend "s3" {
 bucket = "gc-cxascode-state"
 key = "prod/terraform.tfstate"
 region = "us-west-2"
 dynamodb_table = "terraform-locks"
 encrypt = true
 }
}

the critical part is the dynamodb_table. without it, multiple prs running terraform plan simultaneously will read the same state file but fail to lock, leading to potential state corruption or inconsistent plans. in my docker compose integration tests, i mock this behavior by ensuring each service container writes its state to a shared volume with file locking, but for actual ci, you need the distributed lock.

also, ensure your ci workflow uses terraform init -input=false before the plan step. if the backend configuration changes, terraform will prompt for confirmation which hangs the pipeline. adding -input=false forces it to fail fast if there is a mismatch, which is better than a hung job.

one gotcha i hit recently was ensuring the iam role used by the ci runner has dynamodb:PutItem, dynamodb:GetItem, and dynamodb:DeleteItem permissions on the lock table. without these, the plan succeeds locally but fails in ci with a cryptic lock acquisition error. check your pipeline logs for Error acquiring the state lock to confirm if this is your issue.

This has the hallmarks of a standard backend config, but you’re missing the critical authentication step for CI environments.

Terraform won’t pick up your AWS CLI credentials automatically in a headless pipeline unless you explicitly configure the provider block or pass AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as environment variables.

Here is the minimal provider config to ensure the backend auth works:

provider "aws" {
 region = "us-east-1"
}

You need to confirm the backend configuration from the previous post. The S3 and DynamoDB setup handles the locking correctly for our CI pipelines.