Data discrepancy between /api/v2/conversations and /api/v2/analytics/conversations for historical voice data

Ran into a weird issue today with…

I am currently building a custom reporting pipeline that requires fetching detailed interaction metadata for voice conversations that occurred over 24 hours ago. My initial implementation relied on the /api/v2/conversations endpoint, specifically using the GET /api/v2/conversations/{conversationId} method to retrieve full transcript and metric details. However, I noticed that for conversations older than 24 hours, the metrics object within the response payload is significantly truncated. Specifically, fields like talk, hold, and wait durations are either missing or return null, even though the conversation status is closed.

According to the documentation, the /api/v2/conversations endpoint is intended for real-time or near-real-time data, while /api/v2/analytics/conversations is designed for historical reporting. I switched my code to use the analytics endpoint:

import requests

headers = {
 'Authorization': 'Bearer <access_token>',
 'Content-Type': 'application/json'
}

params = {
 'dateFrom': '2023-10-01T00:00:00.000Z',
 'dateTo': '2023-10-31T23:59:59.999Z',
 'view': 'summary'
}

response = requests.get('https://api.mypurecloud.com/api/v2/analytics/conversations/summary', headers=headers, params=params)

The response from the analytics endpoint returns a 200 OK but the JSON structure is entirely different. It aggregates data across all conversations rather than providing the granular, per-conversation detail I need for my specific use case. I tried appending ?filter=conversationId:12345 to the analytics URL, but this results in a 400 Bad Request because the summary view does not support single-conversation filtering in this manner.

Is there a specific query parameter or a different sub-endpoint under /api/v2/analytics/conversations that allows me to fetch detailed, non-aggregated metrics for a single historical conversation? Or is the /api/v2/conversations endpoint simply unreliable for data older than 24 hours, forcing me to implement a custom storage solution for metrics at close time?

Have you tried switching your data source to the Analytics API? The /api/v2/conversations endpoint is designed for operational state, not historical reporting. It typically purges or archives detailed metrics after a short retention window, which explains why your voice data disappears after 24 hours.

For historical analysis, you need to use /api/v2/analytics/conversations/queries. This endpoint aggregates data into time-series buckets. To replicate the detail you expect from the operational API, you must structure your query body carefully. Specifically, you need to request metrics like talk and hold, and use groupBy with interactionType or mediaType to isolate voice interactions.

Here is a Rust example using reqwest to construct this query. Note the strict JSON structure required for the interval and metrics arrays.

use serde_json::json;

let client = reqwest::Client::new();
let body = json!({
 "interval": "2023-10-01T00:00:00.000Z/2023-10-02T00:00:00.000Z",
 "metrics": ["talk", "hold", "wrapup"],
 "groupBy": ["mediaType"],
 "filter": {
 "type": "equals",
 "path": "mediaType",
 "value": "voice"
 }
});

let response = client
 .post("https://api.mypurecloud.com/api/v2/analytics/conversations/queries")
 .header("Authorization", format!("Bearer {}", token))
 .json(&body)
 .send()
 .await?;

This approach ensures you are querying the correct data store. The operational API is for real-time routing and status; the Analytics API is for reporting. Mixing them leads to the exact discrepancy you described. Ensure your OAuth scopes include analytics:query and analytics:read to avoid permission errors.

This looks like a standard retention issue. The suggestion above is correct, but you must include queue.id in your groupBy array for deterministic bucketing. Without it, the analytics engine fails to resolve the hierarchy. Here is the working query structure:

{
 "view": "conversation",
 "groupBy": ["queue.id", "mediaType"],
 "filter": { "type": "and", "clauses": [...] }
}

You might want to check at defining the analytics query payload directly in your Pulumi stack to lock in the schema. This prevents drift when API defaults change.

const query = pulumi.output({
 view: "conversation",
 groupBy: ["queue.id", "mediaType"],
 filter: { type: "and", clauses: [] }
});

Hardcoding the structure in IaC ensures your pipeline stays compliant with the strict grouping rules.

The best way to fix this is to recognize that the analytics query payload structure in Pulumi is not enough if your downstream consumer expects row-level granularity rather than aggregated buckets. The suggestion above correctly identifies the groupBy requirement, but it introduces a silent data loss risk: you are aggregating distinct conversations into a single metric sum. If your pipeline requires individual transcript retrieval or precise timestamp mapping for each interaction, the Analytics API will fail you because it returns sum, avg, or count aggregates, not individual entity records.

For historical voice data older than 24 hours, you must pivot from the operational conversations endpoint to the analytics endpoint, but only if you can tolerate aggregation. If you need raw detail, you are stuck with the operational API’s retention limit unless you implemented a custom Data Action or Webhook to archive full JSON payloads to S3/Azure Blob at ingestion time.

Here is the correct Analytics query payload to minimize aggregation error, ensuring you capture the necessary dimensions without violating the grouping constraints:

{
 "view": "conversation",
 "groupBy": ["queue.id", "mediaType", "wrapUpCode"],
 "filter": {
 "type": "and",
 "clauses": [
 {
 "type": "dimension",
 "dimension": "mediaType",
 "operator": "eq",
 "value": "voice"
 },
 {
 "type": "range",
 "dimension": "startTime",
 "from": "2023-10-01T00:00:00Z",
 "to": "2023-10-02T00:00:00Z"
 }
 ]
 },
 "metrics": ["duration", "holdDuration", "wrapUpDuration"]
}

Note that wrapUpCode is critical for voice categorization. If you omit it, the engine may merge distinct interaction types, skewing your duration metrics. Always validate the groupBy array against your specific reporting granularity requirements before committing to the Analytics API.