Customizing the Genesys Cloud Web Messaging SDK for Voice-to-Text Input with TypeScript

Customizing the Genesys Cloud Web Messaging SDK for Voice-to-Text Input with TypeScript

What You Will Build

  • A TypeScript module that overrides the default Genesys Cloud Web Messaging UI to hide the standard text area and replace it with a microphone button.
  • The implementation uses the Web Speech API to capture voice input, transcribes it to text, and submits it via the @genesyscloud/purecloud-web-messaging SDK.
  • The code covers TypeScript, requiring a modern bundler or Node.js runtime for compilation and DOM execution.

Prerequisites

  • Genesys Cloud organization ID and messaging deployment ID
  • @genesyscloud/purecloud-web-messaging SDK v1.4.0+
  • TypeScript 4.7+ with DOM type definitions
  • Browser support for Web Speech API (Chrome, Edge, Safari with webkit prefix)
  • Deployment routing permission: messaging:send (configured in Genesys Cloud admin console, not passed as an OAuth scope in client-side code)

Authentication Setup

The Web Messaging SDK does not require manual OAuth token management. Authentication is handled server-side through the deployment configuration. The SDK exchanges the orgId and deploymentId for a session token automatically. You only need to provide these identifiers during initialization.

import { createMessaging, PureCloudMessaging } from '@genesyscloud/purecloud-web-messaging';

interface MessagingConfig {
  orgId: string;
  deploymentId: string;
  containerId: string;
}

async function initializeMessaging(config: MessagingConfig): Promise<PureCloudMessaging> {
  const messaging = await createMessaging({
    orgId: config.orgId,
    deploymentId: config.deploymentId,
    uiOptions: {
      theme: 'light',
      locale: 'en-US'
    }
  });

  await messaging.mount(config.containerId);
  return messaging;
}

The SDK establishes a WebSocket connection for real-time messaging and falls back to POST /api/v2/messaging/messages when the connection is unavailable. The underlying request includes an Authorization: Bearer <session-token> header injected by the SDK. You do not manage this token manually.

Implementation

Step 1: Suppress the Default Text Area and Inject Custom UI

The SDK renders a default input container with the class .pcwm-message-input. You can hide it and inject a custom microphone button into the message form.

function overrideInputUI(containerId: string): void {
  const container = document.getElementById(containerId);
  if (!container) return;

  // Suppress default text area
  const defaultInput = container.querySelector('.pcwm-message-input');
  if (defaultInput) {
    (defaultInput as HTMLElement).style.display = 'none';
  }

  // Create microphone button
  const micButton = document.createElement('button');
  micButton.id = 'voice-input-btn';
  micButton.className = 'pcwm-message-input';
  micButton.innerHTML = '<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M12 1a3 3 0 0 0-3 3v8a3 3 0 0 0 6 0V4a3 3 0 0 0-3-3z"/><path d="M19 10v2a7 7 0 0 1-14 0v-2"/><line x1="12" y1="19" x2="12" y2="23"/><line x1="8" y1="23" x2="16" y2="23"/></svg>';
  micButton.style.cursor = 'pointer';
  micButton.style.background = 'transparent';
  micButton.style.border = 'none';
  micButton.style.padding = '8px';

  const formContainer = container.querySelector('.pcwm-message-form');
  if (formContainer) {
    formContainer.appendChild(micButton);
  }
}

This step removes the standard input from view while preserving the layout structure. The custom button retains the SDK styling classes to maintain visual consistency.

Step 2: Implement Web Speech API Integration with TypeScript Typing

The Web Speech API requires explicit type declarations for TypeScript. You must handle browser prefix differences and manage recognition lifecycle events.

interface SpeechRecognitionEvent extends Event {
  results: SpeechRecognitionResultList;
  resultIndex: number;
}

interface SpeechRecognitionErrorEvent extends Event {
  error: string;
  message: string;
}

interface SpeechRecognitionInstance {
  continuous: boolean;
  interimResults: boolean;
  lang: string;
  start(): void;
  stop(): void;
  abort(): void;
  onresult: ((event: SpeechRecognitionEvent) => void) | null;
  onerror: ((event: SpeechRecognitionErrorEvent) => void) | null;
  onend: (() => void) | null;
}

const SpeechRecognition = (window as any).SpeechRecognition || (window as any).webkitSpeechRecognition;

class VoiceTranscriptionHandler {
  private recognition: SpeechRecognitionInstance | null = null;
  private isListening: boolean = false;

  constructor() {
    if (SpeechRecognition) {
      this.recognition = new SpeechRecognition();
      this.recognition.continuous = false;
      this.recognition.interimResults = true;
      this.recognition.lang = 'en-US';
    }
  }

  startListening(onTranscript: (text: string) => void, onError: (error: string) => void): void {
    if (!this.recognition) {
      onError('Web Speech API is not supported in this browser');
      return;
    }

    this.isListening = true;
    this.recognition.onresult = (event: SpeechRecognitionEvent) => {
      let finalTranscript = '';
      let interimTranscript = '';

      for (let i = event.resultIndex; i < event.results.length; i++) {
        const transcript = event.results[i][0].transcript;
        if (event.results[i].isFinal) {
          finalTranscript += transcript;
        } else {
          interimTranscript += transcript;
        }
      }

      if (finalTranscript) {
        onTranscript(finalTranscript.trim());
      } else if (interimTranscript) {
        onTranscript(interimTranscript.trim());
      }
    };

    this.recognition.onerror = (event: SpeechRecognitionErrorEvent) => {
      onError(`Speech recognition error: ${event.error}`);
      this.isListening = false;
    };

    this.recognition.onend = () => {
      this.isListening = false;
    };

    try {
      this.recognition.start();
    } catch (err) {
      onError('Failed to start speech recognition');
    }
  }

  stopListening(): void {
    if (this.recognition && this.isListening) {
      this.recognition.stop();
    }
  }

  isSupported(): boolean {
    return !!this.recognition;
  }
}

This handler isolates the speech API lifecycle. It captures both interim and final results, allowing you to update UI indicators in real time. The onresult loop processes SpeechRecognitionResultList correctly across browser implementations.

Step 3: Bind Voice Transcription to the SDK Message Send Method

You must connect the transcription output to the sendMessage method. The SDK returns a promise that resolves when the message is queued. You need to handle network failures and implement retry logic for rate limits.

async function sendTranscribedMessage(messaging: PureCloudMessaging, text: string, maxRetries: number = 3): Promise<void> {
  let attempt = 0;
  const baseDelay = 1000;

  while (attempt < maxRetries) {
    try {
      await messaging.sendMessage(text);
      return;
    } catch (error: any) {
      const statusCode = error?.response?.status || error?.status;
      
      if (statusCode === 429) {
        const retryAfter = error?.response?.headers?.['retry-after'] || Math.pow(2, attempt);
        const delay = retryAfter * 1000;
        console.warn(`Rate limited (429). Retrying in ${delay}ms...`);
        await new Promise(resolve => setTimeout(resolve, delay));
        attempt++;
      } else if (statusCode === 401 || statusCode === 403) {
        throw new Error(`Authentication or authorization failed: ${statusCode}`);
      } else {
        throw error;
      }
    }
  }
  throw new Error('Max retries exceeded for message submission');
}

The sendMessage method maps to POST /api/v2/messaging/messages. When the WebSocket is active, the SDK batches messages and sends them in real time. When offline, it falls back to the REST endpoint. The retry loop handles 429 Too Many Requests responses by respecting the Retry-After header or applying exponential backoff.

Step 4: Wire Components Together

Combine the UI override, transcription handler, and message sender into a cohesive controller.

class VoiceMessagingController {
  private messaging: PureCloudMessaging | null = null;
  private transcription: VoiceTranscriptionHandler;
  private micButton: HTMLButtonElement | null = null;

  constructor(private config: MessagingConfig) {
    this.transcription = new VoiceTranscriptionHandler();
  }

  async initialize(): Promise<void> {
    this.messaging = await initializeMessaging(this.config);
    overrideInputUI(this.config.containerId);
    
    this.micButton = document.getElementById('voice-input-btn') as HTMLButtonElement;
    if (this.micButton) {
      this.micButton.addEventListener('click', () => this.toggleVoiceInput());
    }

    this.messaging.on('error', (error: any) => {
      console.error('Messaging SDK error:', error);
    });
  }

  private toggleVoiceInput(): void {
    if (!this.messaging) return;

    if (this.transcription.isListening) {
      this.transcription.stopListening();
      this.updateMicButtonState(false);
    } else {
      this.updateMicButtonState(true);
      this.transcription.startListening(
        (transcript: string) => this.handleTranscript(transcript),
        (error: string) => {
          console.error(error);
          this.updateMicButtonState(false);
        }
      );
    }
  }

  private handleTranscript(text: string): void {
    if (!text.trim()) return;
    
    // Optional: Show interim text in a temporary UI element
    console.log('Transcript:', text);
    
    // Send when final result arrives (handled by speech API loop)
    if (this.messaging) {
      sendTranscribedMessage(this.messaging, text).catch(err => {
        console.error('Failed to send message:', err);
      });
    }
  }

  private updateMicButtonState(isActive: boolean): void {
    if (!this.micButton) return;
    this.micButton.style.color = isActive ? '#e53e3e' : '#1a202c';
    this.micButton.title = isActive ? 'Listening...' : 'Click to speak';
  }
}

This controller manages the full lifecycle. It initializes the SDK, hides the default input, attaches the microphone button, and routes transcribed text through the retry-safe sender. The handleTranscript method triggers on every final result chunk.

Complete Working Example

The following module combines all components into a single copy-pasteable TypeScript file. Replace the placeholder configuration values before execution.

import { createMessaging, PureCloudMessaging } from '@genesyscloud/purecloud-web-messaging';

// Type declarations for Web Speech API
interface SpeechRecognitionEvent extends Event {
  results: SpeechRecognitionResultList;
  resultIndex: number;
}
interface SpeechRecognitionErrorEvent extends Event {
  error: string;
  message: string;
}
interface SpeechRecognitionInstance {
  continuous: boolean;
  interimResults: boolean;
  lang: string;
  start(): void;
  stop(): void;
  abort(): void;
  onresult: ((event: SpeechRecognitionEvent) => void) | null;
  onerror: ((event: SpeechRecognitionErrorEvent) => void) | null;
  onend: (() => void) | null;
}

const SpeechRecognition = (window as any).SpeechRecognition || (window as any).webkitSpeechRecognition;

interface MessagingConfig {
  orgId: string;
  deploymentId: string;
  containerId: string;
}

class VoiceTranscriptionHandler {
  private recognition: SpeechRecognitionInstance | null = null;
  private isListening: boolean = false;

  constructor() {
    if (SpeechRecognition) {
      this.recognition = new SpeechRecognition();
      this.recognition.continuous = false;
      this.recognition.interimResults = true;
      this.recognition.lang = 'en-US';
    }
  }

  startListening(onTranscript: (text: string) => void, onError: (error: string) => void): void {
    if (!this.recognition) {
      onError('Web Speech API is not supported in this browser');
      return;
    }
    this.isListening = true;
    this.recognition.onresult = (event: SpeechRecognitionEvent) => {
      let finalTranscript = '';
      for (let i = event.resultIndex; i < event.results.length; i++) {
        if (event.results[i].isFinal) {
          finalTranscript += event.results[i][0].transcript;
        }
      }
      if (finalTranscript) onTranscript(finalTranscript.trim());
    };
    this.recognition.onerror = (event: SpeechRecognitionErrorEvent) => {
      onError(`Speech recognition error: ${event.error}`);
      this.isListening = false;
    };
    this.recognition.onend = () => { this.isListening = false; };
    try { this.recognition.start(); } catch { onError('Failed to start speech recognition'); }
  }

  stopListening(): void {
    if (this.recognition && this.isListening) this.recognition.stop();
  }

  get isListeningActive(): boolean { return this.isListening; }
  get isSupported(): boolean { return !!this.recognition; }
}

async function initializeMessaging(config: MessagingConfig): Promise<PureCloudMessaging> {
  const messaging = await createMessaging({
    orgId: config.orgId,
    deploymentId: config.deploymentId,
    uiOptions: { theme: 'light', locale: 'en-US' }
  });
  await messaging.mount(config.containerId);
  return messaging;
}

function overrideInputUI(containerId: string): void {
  const container = document.getElementById(containerId);
  if (!container) return;
  const defaultInput = container.querySelector('.pcwm-message-input');
  if (defaultInput) (defaultInput as HTMLElement).style.display = 'none';
  
  const micButton = document.createElement('button');
  micButton.id = 'voice-input-btn';
  micButton.className = 'pcwm-message-input';
  micButton.innerHTML = '&#127908;'; // Microphone emoji
  micButton.style.cursor = 'pointer';
  micButton.style.background = 'transparent';
  micButton.style.border = 'none';
  micButton.style.fontSize = '24px';
  micButton.style.padding = '8px';
  
  const formContainer = container.querySelector('.pcwm-message-form');
  if (formContainer) formContainer.appendChild(micButton);
}

async function sendTranscribedMessage(messaging: PureCloudMessaging, text: string, maxRetries: number = 3): Promise<void> {
  let attempt = 0;
  while (attempt < maxRetries) {
    try {
      await messaging.sendMessage(text);
      return;
    } catch (error: any) {
      const statusCode = error?.response?.status || error?.status;
      if (statusCode === 429) {
        const retryAfter = error?.response?.headers?.['retry-after'] || Math.pow(2, attempt);
        await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
        attempt++;
      } else {
        throw error;
      }
    }
  }
  throw new Error('Max retries exceeded for message submission');
}

class VoiceMessagingController {
  private messaging: PureCloudMessaging | null = null;
  private transcription: VoiceTranscriptionHandler;
  private micButton: HTMLButtonElement | null = null;

  constructor(private config: MessagingConfig) {
    this.transcription = new VoiceTranscriptionHandler();
  }

  async initialize(): Promise<void> {
    this.messaging = await initializeMessaging(this.config);
    overrideInputUI(this.config.containerId);
    this.micButton = document.getElementById('voice-input-btn') as HTMLButtonElement;
    if (this.micButton) {
      this.micButton.addEventListener('click', () => this.toggleVoiceInput());
    }
    this.messaging.on('error', (error: any) => console.error('Messaging SDK error:', error));
  }

  private toggleVoiceInput(): void {
    if (!this.messaging) return;
    if (this.transcription.isListeningActive) {
      this.transcription.stopListening();
      this.updateMicButtonState(false);
    } else {
      this.updateMicButtonState(true);
      this.transcription.startListening(
        (transcript: string) => this.handleTranscript(transcript),
        (error: string) => { console.error(error); this.updateMicButtonState(false); }
      );
    }
  }

  private handleTranscript(text: string): void {
    if (!text.trim() || !this.messaging) return;
    sendTranscribedMessage(this.messaging, text).catch(err => console.error('Failed to send message:', err));
  }

  private updateMicButtonState(isActive: boolean): void {
    if (!this.micButton) return;
    this.micButton.style.color = isActive ? '#e53e3e' : '#1a202c';
    this.micButton.title = isActive ? 'Listening...' : 'Click to speak';
  }
}

// Usage
const config: MessagingConfig = {
  orgId: 'YOUR_ORG_ID',
  deploymentId: 'YOUR_DEPLOYMENT_ID',
  containerId: 'genesys-messaging-container'
};

const controller = new VoiceMessagingController(config);
controller.initialize().catch(console.error);

Common Errors & Debugging

Error: SpeechRecognition is not defined

  • Cause: The browser does not support the Web Speech API or requires a vendor prefix.
  • Fix: Ensure you run the code in Chrome, Edge, or Safari. The fallback (window as any).webkitSpeechRecognition handles older Chrome versions. Add a feature detection check before initialization.
  • Code showing the fix: The VoiceTranscriptionHandler constructor already includes the prefix fallback. Add a user-facing warning if this.recognition remains null.

Error: 403 Forbidden on message submission

  • Cause: The deployment ID is invalid, disabled, or lacks the messaging:send routing permission.
  • Fix: Verify the deployment status in Genesys Cloud Admin. Ensure the deployment is published and routing is enabled. Check that the orgId matches the deployment.
  • Code showing the fix: Catch the error in sendTranscribedMessage and log the deployment ID for verification.

Error: 429 Too Many Requests cascade

  • Cause: Rapid transcription results or network instability triggers rate limiting on the messaging endpoint.
  • Fix: The retry loop in sendTranscribedMessage handles this automatically. If you experience persistent 429s, reduce the interimResults frequency or debounce the transcription callback.
  • Code showing the fix: The exponential backoff logic already respects Retry-After headers. Add a debounce wrapper if you send interim results for UI feedback.

Error: WebSocket connection failed fallback to HTTP

  • Cause: Corporate firewalls or strict CORS policies block the WebSocket endpoint.
  • Fix: The SDK automatically falls back to POST /api/v2/messaging/messages. Ensure your deployment allows HTTP fallback in the Genesys Cloud configuration. Monitor the on('error') event for connection drops.

Official References