Resolving State Lock and Drift on Genesys Cloud Routing Queues in Terraform

Resolving State Lock and Drift on Genesys Cloud Routing Queues in Terraform

What You Will Build

  • This tutorial demonstrates how to diagnose and resolve terraform plan failures caused by state lock contention and configuration drift on genesyscloud_routing_queue resources.
  • It uses the Genesys Cloud Terraform Provider (v1.100+) and the Genesys Cloud REST API via Python for state inspection.
  • The code is written in Python for API diagnostics and HCL for Terraform configuration management.

Prerequisites

  • OAuth Client Type: Service Account with routing:queue:read, routing:queue:write, and organization:read scopes.
  • SDK/API Version: Genesys Cloud Terraform Provider v1.100+; Genesys Cloud REST API v2.
  • Language/Runtime: Python 3.9+ (for diagnostic scripts), Terraform 1.5+.
  • External Dependencies:
    • pip install requests python-dotenv
    • terraform init with the Genesys Cloud provider configured.

Authentication Setup

Terraform uses the provider block to handle authentication. For API diagnostics, we will use a simple client credentials flow.

Terraform Provider Configuration

terraform {
  required_providers {
    genesyscloud = {
      source  = "mikesplain/genesyscloud"
      version = ">= 1.100"
    }
  }
}

provider "genesyscloud" {
  client_id     = var.genesys_client_id
  client_secret = var.genesys_client_secret
  base_url      = "https://api.mypurecloud.com"
}

Python Diagnostic Script Authentication

This script retrieves an access token to query the state of queues directly via the API, bypassing Terraform’s state file to identify external changes.

import os
import requests
from dotenv import load_dotenv

load_dotenv()

CLIENT_ID = os.getenv("GENESYS_CLIENT_ID")
CLIENT_SECRET = os.getenv("GENESYS_CLIENT_SECRET")
BASE_URL = "https://api.mypurecloud.com"

def get_access_token():
    """
    Retrieves an OAuth2 access token using client credentials.
    """
    url = f"{BASE_URL}/oauth/token"
    headers = {
        "Content-Type": "application/x-www-form-urlencoded"
    }
    data = {
        "grant_type": "client_credentials",
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET,
        "scope": "routing:queue:read organization:read"
    }

    response = requests.post(url, headers=headers, data=data)
    if response.status_code != 200:
        raise Exception(f"Failed to get token: {response.status_code} - {response.text}")
    
    return response.json()["access_token"]

Implementation

Step 1: Diagnose the State Lock

A “state lock” error in Terraform usually indicates that another process is modifying the state file or that a previous run failed to release the lock. However, when combined with “drift” on a genesyscloud_routing_queue, it often means the provider is struggling to reconcile a complex object with an API response that differs from the state file.

First, verify if a lock is actually active. The Genesys Cloud provider uses a remote state backend (usually AWS S3, Azure Blob, or GCS). If you are using a local backend, the lock is a file terraform.tfstate.lock.info.

If using a remote backend, you must force-unlock the state if a stale lock exists.

# Check for stale locks in S3 (example)
aws s3 ls s3://your-terraform-state-bucket/locks/

# Force unlock if you are certain no other process is running
terraform force-unlock <LOCK_ID>

Warning: Only force-unlock if you are certain no other Terraform process is actively running. Forcing a unlock while another process writes can corrupt the state.

Step 2: Identify Drift Source via API

Drift occurs when the actual state in Genesys Cloud differs from the terraform.tfstate file. For genesyscloud_routing_queue, common drift sources include:

  1. Manual changes to queue description or name in the Genesys Admin UI.
  2. Automated changes via other scripts or APIs.
  3. Default value changes in the Genesys Cloud platform.

We will query the queue directly from Genesys Cloud to compare it with our Terraform state.

import json

def get_queue_details(access_token: str, queue_id: str):
    """
    Retrieves full details of a routing queue from Genesys Cloud.
    Endpoint: GET /api/v2/routing/queues/{queueId}
    Scope: routing:queue:read
    """
    url = f"{BASE_URL}/api/v2/routing/queues/{queue_id}"
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Accept": "application/json"
    }

    response = requests.get(url, headers=headers)
    
    if response.status_code == 404:
        raise Exception(f"Queue {queue_id} not found in Genesys Cloud.")
    elif response.status_code == 401:
        raise Exception("Unauthorized. Check token and scopes.")
    elif response.status_code == 429:
        raise Exception("Rate limited. Wait and retry.")
    elif response.status_code != 200:
        raise Exception(f"API Error: {response.status_code} - {response.text}")
    
    return response.json()

def compare_queue_with_tfc(queue_id: str):
    """
    Compares live API data with a simulated Terraform state check.
    """
    token = get_access_token()
    live_data = get_queue_details(token, queue_id)
    
    print(f"--- Live Genesys Cloud Data for Queue {queue_id} ---")
    print(f"Name: {live_data.get('name')}")
    print(f"Description: {live_data.get('description', 'None')}")
    print(f"Enabled: {live_data.get('enabled')}")
    print(f"Wrap Up Policy: {live_data.get('wrapup_policy', {}).get('code', 'None')}")
    
    # Check for common drift fields
    # The 'description' field is a frequent source of drift if admins edit it manually
    if live_data.get('description'):
        print(f"WARNING: Description is set to '{live_data['description']}'. If this differs from your HCL, Terraform will detect drift.")

if __name__ == "__main__":
    # Replace with your actual queue ID from terraform.tfstate
    TARGET_QUEUE_ID = "a1b2c3d4-5678-90ab-cdef-123456789012"
    compare_queue_with_tfc(TARGET_QUEUE_ID)

Step 3: Resolve Drift via Terraform Import or Refresh

If the API shows data that differs from your HCL, you have two options:

  1. Update your HCL to match the live state (if the change was intentional).
  2. Force Terraform to overwrite the live state with your HCL configuration.

Option A: Refresh State (Read-Only)

Run terraform refresh to update the state file with the latest values from Genesys Cloud. This does not change infrastructure but updates the state file.

terraform refresh

If terraform refresh fails with a lock error, ensure no other processes are running. If it succeeds but shows changes, those are the drift points.

Option B: Import Existing Resource

If the queue exists in Genesys Cloud but is not in your Terraform state, or if the state is corrupted, you can import the resource.

# Syntax: terraform import <RESOURCE_ADDRESS> <QUEUE_ID>
terraform import genesyscloud_routing_queue.my_queue a1b2c3d4-5678-90ab-cdef-123456789012

Note: After importing, run terraform plan. If there are differences, Terraform will show what it intends to change. Review these changes carefully.

Step 4: Prevent Future Drift with ignore_changes

For fields that are frequently modified by admins (e.g., description) or by other systems, you can tell Terraform to ignore changes to those fields. This prevents Terraform from detecting drift and attempting to revert changes.

resource "genesyscloud_routing_queue" "my_queue" {
  name        = "Support Queue"
  description = "Primary support queue"
  enabled     = true
  queue_type  = "MULTI_SKILL"

  # Ignore changes to description to prevent drift from manual edits
  lifecycle {
    ignore_changes = [
      description,
      # Also ignore wrapup_policy if it is managed by another system
      wrapup_policy
    ]
  }

  # ... other queue configuration ...
}

Complete Working Example

Terraform Configuration (main.tf)

variable "genesys_client_id" {
  type = string
}

variable "genesys_client_secret" {
  type      = string
  sensitive = true
}

provider "genesyscloud" {
  client_id     = var.genesys_client_id
  client_secret = var.genesys_client_secret
  base_url      = "https://api.mypurecloud.com"
}

resource "genesyscloud_routing_queue" "support_queue" {
  name        = "Technical Support"
  description = "Queue for technical issues"
  enabled     = true
  queue_type  = "MULTI_SKILL"
  
  # Prevent drift from manual UI edits
  lifecycle {
    ignore_changes = [
      description
    ]
  }

  # Define a default wrapup policy
  wrapup_policy {
    code = "OTHER"
  }

  # Define a default skill
  member_skills {
    skill_id = "skill-id-from-organization"
    level    = 5
  }
}

Python Diagnostic Script (diagnose_drift.py)

import os
import sys
import requests
from dotenv import load_dotenv

load_dotenv()

CLIENT_ID = os.getenv("GENESYS_CLIENT_ID")
CLIENT_SECRET = os.getenv("GENESYS_CLIENT_SECRET")
BASE_URL = "https://api.mypurecloud.com"

def get_access_token():
    url = f"{BASE_URL}/oauth/token"
    headers = {"Content-Type": "application/x-www-form-urlencoded"}
    data = {
        "grant_type": "client_credentials",
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET,
        "scope": "routing:queue:read organization:read"
    }
    response = requests.post(url, headers=headers, data=data)
    if response.status_code != 200:
        raise Exception(f"Token error: {response.text}")
    return response.json()["access_token"]

def get_queue(queue_id: str, token: str):
    url = f"{BASE_URL}/api/v2/routing/queues/{queue_id}"
    headers = {"Authorization": f"Bearer {token}", "Accept": "application/json"}
    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        raise Exception(f"API Error: {response.status_code} - {response.text}")
    return response.json()

def main():
    if len(sys.argv) < 2:
        print("Usage: python diagnose_drift.py <QUEUE_ID>")
        sys.exit(1)
    
    queue_id = sys.argv[1]
    try:
        token = get_access_token()
        queue_data = get_queue(queue_id, token)
        
        print(f"Queue ID: {queue_id}")
        print(f"Name: {queue_data.get('name')}")
        print(f"Description: {queue_data.get('description')}")
        print(f"Enabled: {queue_data.get('enabled')}")
        print(f"Queue Type: {queue_data.get('queue_type')}")
        
        # Check for potential drift in common fields
        if queue_data.get('description') != "Queue for technical issues":
            print("DRIFT DETECTED: Description differs from expected HCL value.")
        else:
            print("No drift detected in description.")
            
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)

if __name__ == "__main__":
    main()

Common Errors & Debugging

Error: Error acquiring the state lock

Cause: Another Terraform process is running, or a previous run crashed and left a stale lock.

Fix:

  1. Verify no other terraform processes are running on the machine or in CI/CD pipelines.
  2. If safe, force-unlock:
    terraform force-unlock <LOCK_ID>
    
  3. If using S3 backend, check for concurrent writes:
    aws s3 ls s3://your-bucket/locks/
    

Error: Error reading queue: 404 Not Found

Cause: The queue ID in the state file does not exist in Genesys Cloud, or the OAuth token lacks permissions.

Fix:

  1. Verify the queue ID exists in Genesys Cloud Admin UI.
  2. Check OAuth scopes: Ensure routing:queue:read is included.
  3. If the queue was deleted, remove the resource from Terraform state:
    terraform state rm genesyscloud_routing_queue.my_queue
    

Error: 429 Too Many Requests

Cause: Rate limiting from Genesys Cloud API.

Fix:

  1. Wait and retry.
  2. Implement exponential backoff in scripts.
  3. In Terraform, this is usually handled internally, but if persistent, reduce the number of parallel operations (-parallelism=1).

Error: Drift detected in wrapup_policy

Cause: Genesys Cloud may update default wrapup codes or policies.

Fix:

  1. Check the live API response for the current wrapup_policy.
  2. Update your HCL to match the live state, or add wrapup_policy to ignore_changes if it is managed externally.

Official References