Just noticed that our daily IaC reconciliation pipeline is flagging a persistent state drift on the genesyscloud_routing_skill resource. The Terraform state file shows the skill is assigned to a specific queue, but the Genesys Cloud UI and the REST API response indicate it is not.
Environment details:
Provider: genesyscloud/provider v1.12.0
Region: ap-southeast-2 (Sydney)
Module: Custom wrapper around genesyscloud_routing_queue
When running terraform plan, it suggests adding the skill to the queue. Running terraform apply succeeds with no errors. However, immediately after the apply, terraform plan shows the same drift again. The API call to GET /api/v2/routing/queues/{queueId} confirms the skill is missing from the routing_skills array, despite the provider logging a successful PUT operation.
Is this a known caching issue with the provider or a backend sync delay in the Sydney region? The drift prevents our CI/CD pipeline from passing the drift detection check.
You might want to look at the genesyscloud_routing_skill API response directly. Sometimes the provider cache gets stale during high load. Run a simple GET request to verify the skill ID exists and is actually linked. If the API returns 200 with the link, force a terraform refresh. If it returns 404, the resource creation failed silently.
This is typically caused by the asynchronous nature of skill-queue associations in the Genesys Cloud API. When Terraform applies the genesyscloud_routing_skill resource, it often marks the resource as complete immediately after the initial HTTP 201 response, even if the background job linking the skill to the queue is still processing. The provider does not natively wait for the association to fully propagate to the read endpoints, leading to the state drift you are seeing.
A more robust approach is to introduce a dependency or a wait mechanism in your Terraform configuration. Instead of relying solely on the skill resource, use the genesyscloud_routing_queue resource to explicitly define the skill assignments. This ensures the queue resource handles the association logic. If you must use the skill resource, add a depends_on clause pointing to the queue resource, and consider using a null_resource with a local-exec provisioner to poll the API until the association is confirmed.
For example, you can add a simple check in your pipeline:
In our Chicago WFM environment, we publish schedules weekly and have seen similar race conditions when skill weights change. The API sometimes returns a stale cache during high load, especially in the ap-southeast-2 region. Forcing a refresh or adding this explicit wait step usually resolves the drift without needing to tear down resources. It is worth checking if the queue resource already has the skill defined, as managing associations from the queue side tends to be more stable.
According to the docs, they say that asynchronous replication is the standard behavior for routing resources in Genesys Cloud. This explains the discrepancy between the immediate API response and the eventual consistency model observed in the UI.
Cause:
The provider marks the resource as complete upon receiving the initial HTTP 201. The background job linking the skill to the queue continues processing. This delay creates a window where the Terraform state reflects the intended configuration, but the runtime environment has not yet synchronized. The state file becomes stale relative to the actual operational status.
Solution:
Implement a depends_on attribute to enforce ordering if other resources rely on this skill. Alternatively, add a small delay or a custom null_resource with a local-exec script that polls the API until the association is confirmed. This ensures the infrastructure code waits for the backend to catch up before proceeding with subsequent steps. Verifying the API response directly, as suggested previously, is a valid diagnostic step, but structural changes to the Terraform module are required for long-term stability.