Implementing Augmented Reality Overlay Guidance for Field Technician Remote Assistance
What This Guide Covers
This guide details the architectural implementation of real-time Augmented Reality (AR) overlay guidance for field technicians using Genesys Cloud CX Media Channels and custom React Native mobile applications. You will build a system that allows an expert agent to draw visual annotations on a shared video feed, which are then rendered as persistent overlays on the technician’s mobile device camera view, synchronized via WebSockets and stored as artifacts in the Interaction Transcript.
Prerequisites, Roles & Licensing
Licensing Tiers
- Genesys Cloud CX 3: Required for access to Video Calling and Screen Sharing via the Media Channels API.
- Genesys Cloud CX 4: Recommended for Integrations (API Access) and advanced Architect logic required to route video interactions to specific skill groups.
- WEM (Workforce Engagement Management) Add-on: Optional but recommended if you need to capture AR session duration for labor cost attribution.
Permissions & Roles
- Admin Role:
Telephony > Media Channels > Edit,Integrations > API Access > Edit,Architect > Edit. - API User: Must be assigned the
api_userrole with specific OAuth scopes:video:read(to monitor session status)video:write(to initiate video streams)interaction:read(to append AR annotation data to the transcript)user:read(to resolve technician identities)
External Dependencies
- React Native Mobile Application: Custom-built app running on iOS/Android devices. Must support
react-native-webrtcfor camera capture andreact-native-svgfor rendering overlays. - Backend Middleware (Node.js/Python): A lightweight service to bridge Genesys Media Channels WebSockets with the custom AR annotation protocol. Genesys does not natively support “draw-on-video” primitives; this requires custom application logic layered on top of the standard video stream.
- CDN (AWS S3 + CloudFront or Azure Blob): For storing high-resolution reference images or PDF manuals that may be pushed as overlays during the session.
The Implementation Deep-Dive
1. Architecting the Video Interaction Flow
The foundation of AR remote assistance is a bidirectional video stream. In Genesys Cloud, this is not handled by the traditional CTI (Computer Telephony Integration) model but via the Media Channels framework.
The Trap: Treating Video Like Voice
A common architectural error is attempting to route video interactions through standard Telephony Routing policies. Video requires significantly higher bandwidth and different codec negotiation (VP8/VP9 vs. G.711/Opus). If you route video through a standard voice queue, the connection will fail or suffer from severe jitter because the underlying SIP trunking infrastructure is optimized for low-latency audio, not high-throughput video.
The Solution: Dedicated Video Media Channels
You must create a dedicated Media Channel for Video in the Genesys Admin console.
- Navigate to Admin > Media Channels.
- Create a new channel with Type: Video.
- Configure the Jitter Buffer settings. For AR guidance, you must prioritize Low Latency over High Quality. Set the Jitter Buffer to 0ms or Adaptive. A static buffer introduces lag that breaks the spatial alignment of AR overlays. If the agent draws a circle on a valve, and the technician sees it 500ms later, the valve may have moved, causing misalignment.
Architect Routing Logic
In Architect, create a new flow for “Remote Assist Request.”
- Start Node: Connect to the Video Channel.
- Find Team: Use the Find Team block. Filter by Skill: Video_Assist_Expert.
- Offer Interaction: Use the Offer Interaction block.
- Critical Configuration: Set the Timeout to 30 seconds. Video sessions are expensive and resource-intensive. Long waits lead to abandoned sessions.
- Queue Strategy: Use Longest Idle to distribute load evenly among experts.
// Architect Expression for Dynamic Team Selection
// This expression selects the team based on the technician's device type and location
// to ensure the expert has relevant context.
if (interaction.channel.type == "video") {
return teams.find(t => t.name == "Video_Assist_" + interaction.data.deviceType);
} else {
return null;
}
2. Establishing the WebSocket Bridge for AR Annotations
Genesys Media Channels provides the video stream, but it does not provide a native API for “remote drawing.” You must implement a parallel data channel to transmit annotation coordinates.
The Trap: Sending Annotations via Video Frames
Some developers attempt to encode annotation data into the video stream itself (e.g., by having the agent draw on their local screen and sharing that screen). This is catastrophic for AR. Screen sharing compresses the video, losing the crispness of vector drawings. More importantly, it prevents the technician from seeing their own camera view clearly, as the screen share replaces the camera feed.
The Solution: Parallel WebSocket Data Channel
You will use the Genesys Media Channels WebSocket API to establish the video connection, but you will run a secondary WebSocket connection to your custom middleware for annotation data.
-
Initialize Video Stream:
The React Native app connects to Genesys using thegenesyscloud-videoSDK. This establishes the SDP (Session Description Protocol) exchange. -
Initialize Annotation Channel:
Simultaneously, the app opens a WebSocket to your backend middleware. This WebSocket carries lightweight JSON payloads representing drawing vectors.
// React Native: Initializing the Annotation WebSocket
const annotationSocket = new WebSocket('wss://your-middleware.example.com/annotations');
annotationSocket.onopen = () => {
console.log('Annotation channel connected');
// Send initial session metadata
annotationSocket.send(JSON.stringify({
action: 'join',
sessionId: currentGenesysSessionId,
technicianId: user.uid,
timestamp: Date.now()
}));
};
// Handling incoming annotations from the Expert
annotationSocket.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'draw') {
// Update the AR overlay state
setAnnotations(prev => [...prev, data.payload]);
}
};
The Annotation Payload Structure
The payload must be device-agnostic. Do not send pixel coordinates (e.g., x=100, y=200) because screen densities vary. Send normalized coordinates (0.0 to 1.0) or 3D spatial points if using ARKit/ARCore.
{
"action": "draw",
"sessionId": "vc-12345-abc",
"timestamp": 1678886400000,
"payload": {
"tool": "arrow",
"color": "#FF0000",
"width": 2,
"points": [
{ "x": 0.5, "y": 0.5 },
{ "x": 0.7, "y": 0.6 }
],
"id": "ann-001"
}
}
3. Rendering AR Overlays on the Technician’s Device
The technician’s device must render these annotations on top of the live camera feed. This requires a layered UI approach.
The Trap: Overlaying on the Video Player Only
If you only render annotations on the video player component, the technician cannot see the annotations in their peripheral vision or when they move the camera slightly. The annotations must be anchored to the device screen, not the video element.
The Solution: Full-Screen Overlay with Hit Testing
In React Native, use an absolute-positioned view that covers the entire screen. The camera feed is rendered in the background, and the SVG annotations are rendered in the foreground.
- Camera View: Renders the live feed.
- Annotation Layer: Renders the SVG paths based on the normalized coordinates received from the WebSocket.
- Hit Testing: When the technician taps the screen, you must convert the touch coordinates to normalized coordinates and send them back to the expert if the expert requests a “point-and-shoot” gesture.
// React Native: AR Overlay Component
import { View, StyleSheet } from 'react-native';
import { Svg, Path } from 'react-native-svg';
import { Camera } from 'react-native-vision-camera';
const AROverlay = ({ annotations }) => {
return (
<View style={styles.container}>
<Camera
style={StyleSheet.absoluteFill}
isActive={true}
device={cameraDevice}
/>
<Svg
style={StyleSheet.absoluteFill}
width="100%"
height="100%"
>
{annotations.map(ann => (
<Path
key={ann.id}
d={`M ${ann.points[0].x * screenWidth} ${ann.points[0].y * screenHeight} L ${ann.points[1].x * screenWidth} ${ann.points[1].y * screenHeight}`}
stroke={ann.color}
strokeWidth={ann.width}
fill="none"
/>
))}
</Svg>
</View>
);
};
const styles = StyleSheet.create({
container: {
flex: 1,
justifyContent: 'center',
alignItems: 'center',
},
});
4. Persisting AR Data in the Interaction Transcript
For compliance, training, and knowledge base generation, you must store the AR annotations. Genesys Interaction Data allows you to append custom data to the interaction record.
The Trap: Losing Context After Session End
If you only store the annotations in your middleware database, they are disconnected from the Genesys interaction record. Support managers cannot see what was discussed during the video call when reviewing the interaction in Genesys Admin.
The Solution: Using the Interaction API to Append Artifacts
At regular intervals (e.g., every 10 seconds) or at session end, your middleware should call the Genesys Interaction API to update the interaction with the annotation data.
- Endpoint:
POST /api/v2/interactions/{interactionId}/data - Payload: Include the annotation history in the
attributesfield.
// API Call to Genesys Interaction API
POST /api/v2/interactions/inter-12345/data
{
"type": "custom",
"name": "ar_annotations",
"data": {
"session_id": "vc-12345-abc",
"annotations": [
{
"id": "ann-001",
"type": "arrow",
"points": [{ "x": 0.5, "y": 0.5 }, { "x": 0.7, "y": 0.6 }],
"timestamp": 1678886400000
}
],
"reference_images": [
"https://cdn.example.com/manuals/valve-x.png"
]
}
}
This data can then be visualized in Genesys Conversation Intelligence or exported via Reporting for post-call analysis.
Validation, Edge Cases & Troubleshooting
Edge Case 1: Device Orientation Mismatch
The Failure Condition: The technician rotates their phone from portrait to landscape. The AR annotations appear in the wrong location or are stretched.
The Root Cause: The normalized coordinates (0.0-1.0) are calculated based on the screen dimensions at the time of the draw. If the expert draws in portrait and the technician views in landscape, the aspect ratio changes, distorting the overlay.
The Solution:
- Force the React Native app to Lock Orientation to Portrait during AR sessions. This is the simplest and most robust solution.
- Alternatively, implement Coordinate Transformation logic in the middleware. When a rotation event is detected, recalculate the normalized coordinates based on the new aspect ratio before rendering.
Edge Case 2: WebSocket Disconnection During Video Call
The Failure Condition: The video stream remains active, but annotations stop appearing. The expert continues to draw, but the technician sees nothing.
The Root Cause: The WebSocket connection to the middleware is more fragile than the WebRTC video connection. Network fluctuations can drop the WebSocket while the video stream (which has its own reconnection logic) stays up.
The Solution:
- Implement Heartbeat Pings in the WebSocket client. If no ping is received in 5 seconds, attempt to reconnect.
- Implement Local Buffering on the technician’s device. Store annotations in local storage. If the WebSocket reconnects, request the “missing” annotations from the middleware based on the session timestamp.
Edge Case 3: High Latency in Annotation Rendering
The Failure Condition: The expert draws an arrow, and it takes 2+ seconds to appear on the technician’s screen.
The Root Cause: The middleware is processing too many annotations, or the JSON payloads are too large.
The Solution:
- Throttle Annotation Updates: Do not send every single point of a drawing stroke. Send the start point, end point, and a few intermediate points. Use Bezier Curves to smooth the line on the client side.
- Compress Payloads: Use Protocol Buffers (Protobuf) instead of JSON for the WebSocket data channel. This reduces payload size by up to 70%, significantly reducing network overhead.