Resolve State Lock and Drift in Genesys Cloud Routing Queues with Terraform

Resolve State Lock and Drift in Genesys Cloud Routing Queues with Terraform

What You Will Build

  • A Terraform configuration that reliably manages Genesys Cloud routing queues without triggering phantom state drift.
  • A Python diagnostic script that validates API token permissions and checks for external modifications to queue data.
  • A workflow to unlock stale state and reconcile configuration with the actual platform state.

Prerequisites

  • Terraform: Version 1.5+ installed and available in your PATH.
  • Genesys Cloud Terraform Provider: Version 1.100+ (recommended).
  • Python 3.9+: For the diagnostic script.
  • Genesys Cloud Admin Account: Requires routing:queue:read and routing:queue:write scopes.
  • API Credentials: Client ID and Client Secret for a Confidential Client in Genesys Cloud.

Authentication Setup

Terraform handles authentication via the genesyscloud provider block. You must ensure your environment variables are set correctly to avoid authentication failures that manifest as state corruption.

# Set environment variables for Terraform
export GENESYS_CLOUD_REGION=us-east-1
export GENESYS_CLOUD_CLIENT_ID="your-client-id"
export GENESYS_CLOUD_CLIENT_SECRET="your-client-secret"

In your main.tf, configure the provider explicitly to enable debug logging if issues persist.

terraform {
  required_providers {
    genesyscloud = {
      source  = "mygenesys/genesyscloud"
      version = "~> 1.100"
    }
  }
}

provider "genesyscloud" {
  # Optional: Enable debug logging to see API calls
  # debug = true
}

Implementation

Step 1: Identify the State Lock Source

A “state lock issue” in Terraform usually means one of two things:

  1. A previous terraform apply or plan failed mid-execution, leaving a lock file in the backend (S3, Azure Blob, or local state lock file).
  2. The provider is attempting to read resource attributes that are being modified by another process (e.g., a webhook, an API call, or another Terraform workspace), causing a read/write conflict that locks the state.

First, check for a stale lock. If you see Error acquiring the state lock, run:

# Force unlock if you are certain no other operation is running
terraform force-unlock <LOCK_ID>

If force-unlock succeeds but drift persists, the issue is likely data inconsistency between the Terraform state file and the Genesys Cloud API. This often happens with genesyscloud_routing_queue because queues have many nested blocks (members, skills, hours of operation) that can be modified outside Terraform.

Step 2: Diagnose Drift with a Python API Script

Before running terraform plan, use this Python script to verify what the Genesys Cloud API actually returns for a specific queue. This bypasses Terraform’s caching and provider logic to give you the raw truth.

Install dependencies:

pip install requests

Create check_queue_drift.py:

import requests
import json
import os
import sys

def get_access_token():
    """
    Retrieves a Genesys Cloud OAuth2 access token using client credentials.
    """
    client_id = os.getenv("GENESYS_CLOUD_CLIENT_ID")
    client_secret = os.getenv("GENESYS_CLOUD_CLIENT_SECRET")
    region = os.getenv("GENESYS_CLOUD_REGION", "us-east-1")
    
    # Determine base URL based on region
    if "us" in region and "gov" not in region:
        base_url = "https://api.mypurecloud.com"
    elif "eu" in region:
        base_url = "https://api.eu.mypurecloud.com"
    elif "au" in region:
        base_url = "https://api.au.mypurecloud.com"
    elif "gov" in region:
        base_url = "https://api.us-gov.mypurecloud.com"
    else:
        base_url = "https://api.mypurecloud.com"

    url = f"{base_url}/oauth/token"
    headers = {
        "Content-Type": "application/x-www-form-urlencoded"
    }
    data = {
        "grant_type": "client_credentials"
    }
    
    # Use HTTP Basic Auth for client credentials
    auth = (client_id, client_secret)
    
    try:
        response = requests.post(url, headers=headers, data=data, auth=auth)
        response.raise_for_status()
        return response.json()["access_token"]
    except requests.exceptions.RequestException as e:
        print(f"Error fetching token: {e}")
        sys.exit(1)

def get_queue_details(access_token, queue_id):
    """
    Fetches full details of a routing queue using the Genesys Cloud API.
    Endpoint: GET /api/v2/routing/queues/{queueId}
    Scope: routing:queue:read
    """
    region = os.getenv("GENESYS_CLOUD_REGION", "us-east-1")
    
    if "us" in region and "gov" not in region:
        base_url = "https://api.mypurecloud.com"
    elif "eu" in region:
        base_url = "https://api.eu.mypurecloud.com"
    elif "au" in region:
        base_url = "https://api.au.mypurecloud.com"
    elif "gov" in region:
        base_url = "https://api.us-gov.mypurecloud.com"
    else:
        base_url = "https://api.mypurecloud.com"

    url = f"{base_url}/api/v2/routing/queues/{queue_id}"
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Content-Type": "application/json"
    }

    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as e:
        if response.status_code == 401:
            print("Error: Invalid token or insufficient scopes (routing:queue:read)")
        elif response.status_code == 403:
            print("Error: Forbidden. Check if the queue exists and you have access.")
        elif response.status_code == 429:
            print("Error: Rate limited. Wait and retry.")
        else:
            print(f"HTTP Error: {e}")
        sys.exit(1)
    except requests.exceptions.RequestException as e:
        print(f"Network error: {e}")
        sys.exit(1)

def main():
    if len(sys.argv) != 2:
        print("Usage: python check_queue_drift.py <QUEUE_ID>")
        sys.exit(1)

    queue_id = sys.argv[1]
    
    print("Fetching OAuth token...")
    access_token = get_access_token()
    print("Token acquired.")

    print(f"Fetching queue details for ID: {queue_id}...")
    queue_data = get_queue_details(access_token, queue_id)

    # Output key fields that commonly cause drift
    print("\n--- Queue Details ---")
    print(f"Name: {queue_data.get('name')}")
    print(f"Description: {queue_data.get('description')}")
    print(f"Enabled: {queue_data.get('enabled')}")
    
    # Check Wrap Up Policy
    wrap_up = queue_data.get("wrap_up_policy", {})
    print(f"Wrap Up Policy: {wrap_up.get('type')} (Timeout: {wrap_up.get('timeout_seconds')})")

    # Check Skills
    skills = queue_data.get("skills", [])
    print(f"Number of Skills: {len(skills)}")
    for skill in skills:
        print(f"  - Skill ID: {skill.get('id')}, Name: {skill.get('name')}")

    # Check Members (This is a common source of drift if members are added manually)
    members = queue_data.get("members", [])
    print(f"Number of Members: {len(members)}")
    if members:
        print("First 3 members:")
        for member in members[:3]:
            print(f"  - User ID: {member.get('user', {}).get('id')}, Type: {member.get('member_type')}")

    # Check Hours of Operation
    hours = queue_data.get("hours_of_operation", {})
    print(f"Hours of Operation ID: {hours.get('id')}")

    # Check Outbound Campaigns
    campaigns = queue_data.get("outbound_campaigns", [])
    print(f"Number of Outbound Campaigns: {len(campaigns)}")

    print("\n--- Raw JSON Response (Truncated) ---")
    # Print full JSON to a file for detailed comparison
    with open(f"queue_{queue_id}_raw.json", "w") as f:
        json.dump(queue_data, f, indent=2)
    print(f"Full response saved to queue_{queue_id}_raw.json")

if __name__ == "__main__":
    main()

Run the script with the ID of the queue showing drift:

python check_queue_drift.py <QUEUE_ID_FROM_TERRAFORM_STATE>

Step 3: Compare Terraform State with API Response

Open the queue_<ID>_raw.json file. Compare the values with your Terraform configuration.

Common causes of drift in genesyscloud_routing_queue:

  1. Members: If you add users to a queue via the Genesys Cloud UI, Terraform does not know about them. If your Terraform config does not explicitly define members, Terraform will try to remove them on the next apply, causing drift warnings.
  2. Skills: If skills are added via the UI, they will appear in the API response but not in Terraform state.
  3. Hours of Operation: If the referenced HOV is updated, the queue’s internal reference might change.
  4. Wrap Up Policy: Default values may differ between the API and the provider’s default assumptions.

To see what Terraform currently has in state, run:

terraform state show resource.genesyscloud_routing_queue.my_queue

Compare this output with the queue_<ID>_raw.json file. Look for discrepancies in:

  • members
  • skills
  • outbound_campaigns
  • hours_of_operation

Step 4: Resolve Drift

Option A: Import External Changes into Terraform

If the changes made outside Terraform are intentional, update your Terraform configuration to match the API state.

Example: If a user was added to the queue manually, add them to your Terraform config:

resource "genesyscloud_routing_queue" "my_queue" {
  name        = "Support Queue"
  description = "General support queue"
  enabled     = true

  # Explicitly define members to match API state
  members {
    user_id = "user-id-from-api"
    member_type = "agent"
    after_contact_work = 0
  }

  # Explicitly define skills to match API state
  skills {
    id   = "skill-id-from-api"
    name = "General Support"
  }
}

Then run:

terraform plan

If the plan shows no changes, the drift is resolved.

Option B: Reset State to Match API (Dangerous)

If you want to discard Terraform’s knowledge of the queue and let it re-import the current API state, you can remove the resource from state and re-import it.

  1. Remove the resource from state:
terraform state rm genesyscloud_routing_queue.my_queue
  1. Re-import the resource:
terraform import genesyscloud_routing_queue.my_queue <QUEUE_ID>
  1. Run terraform plan to see the differences between your configuration and the current API state. Update your configuration to match, then apply.

Option C: Ignore Specific Attributes

If certain attributes (like members or outbound_campaigns) are managed outside Terraform and you do not want Terraform to manage them, use lifecycle blocks to ignore changes.

resource "genesyscloud_routing_queue" "my_queue" {
  name        = "Support Queue"
  description = "General support queue"
  enabled     = true

  lifecycle {
    ignore_changes = [
      members,
      outbound_campaigns
    ]
  }
}

This prevents Terraform from reporting drift on these attributes and prevents it from trying to revert external changes.

Complete Working Example

Below is a complete Terraform configuration for a routing queue that avoids common drift issues by explicitly defining all manageable attributes and ignoring dynamic ones.

main.tf:

terraform {
  required_providers {
    genesyscloud = {
      source  = "mygenesys/genesyscloud"
      version = "~> 1.100"
    }
  }
}

provider "genesyscloud" {
  # Uncomment for debugging
  # debug = true
}

# Data source to fetch a skill by name
data "genesyscloud_routing_skill" "general_skill" {
  name = "General Support"
}

# Resource definition
resource "genesyscloud_routing_queue" "support_queue" {
  name        = "Customer Support"
  description = "Queue for general customer support inquiries"
  enabled     = true

  # Wrap up policy
  wrap_up_policy {
    type              = "REQUIRED"
    timeout_seconds   = 300
    allow_empty       = true
  }

  # Skills: Explicitly reference the skill data source
  skills {
    id   = data.genesyscloud_routing_skill.general_skill.id
    name = data.genesyscloud_routing_skill.general_skill.name
  }

  # Hours of Operation: Reference a predefined HOV
  hours_of_operation {
    id   = "your-hours-of-operation-id"
    name = "Business Hours"
  }

  # Outbound Campaigns: Empty if none, or reference specific campaigns
  outbound_campaigns = []

  # Lifecycle: Ignore members if they are managed manually or via another process
  lifecycle {
    ignore_changes = [
      members
    ]
  }
}

variables.tf:

# No variables needed for this example, but you can parameterize the queue name
variable "queue_name" {
  description = "Name of the routing queue"
  type        = string
  default     = "Customer Support"
}

outputs.tf:

output "queue_id" {
  description = "The ID of the created routing queue"
  value       = genesyscloud_routing_queue.support_queue.id
}

output "queue_path" {
  description = "The API path to the queue"
  value       = genesyscloud_routing_queue.support_queue.path
}

Common Errors & Debugging

Error: State Lock Timeout

Cause: A previous Terraform operation failed to release the lock. This can happen if the process was killed (Ctrl+C) during an API call.

Fix:

  1. Verify no other Terraform processes are running.
  2. Run terraform force-unlock <LOCK_ID>.
  3. If the lock ID is not available, check the backend storage (e.g., S3 DynamoDB table for AWS backend) and delete the lock entry manually.

Error: 409 Conflict on Queue Update

Cause: The Genesys Cloud API returns a 409 Conflict if you try to update a queue with invalid data or if there is a version mismatch. This can happen if the queue was modified outside Terraform between the plan and apply steps.

Fix:

  1. Run terraform plan again to refresh the state.
  2. Check for external modifications using the Python diagnostic script.
  3. Use lifecycle { ignore_changes = [...] } for attributes that change frequently outside Terraform.

Error: 429 Too Many Requests

Cause: The Genesys Cloud API has rate limits. If you are managing many queues or running Terraform in parallel, you may hit these limits.

Fix:

  1. Implement retry logic in your provider configuration.
  2. Reduce parallelism in Terraform:
terraform apply -parallelism=5
  1. Add a delay between operations if using custom scripts.

Error: Attribute “members” has unexpected drift

Cause: The queue members were added or removed via the Genesys Cloud UI or API, but Terraform state still reflects the old members.

Fix:

  1. Use terraform refresh to update the state from the API.
  2. Update your Terraform configuration to include the new members.
  3. Or, ignore the members attribute in the lifecycle block if they are managed externally.

Official References