Implementing Semantic Search in Agent Assist using Amazon OpenSearch and Bedrock

Implementing Semantic Search in Agent Assist using Amazon OpenSearch and Bedrock

What This Guide Covers

You are building a real-time semantic search pipeline that powers an Agent Assist knowledge base within the Genesys Cloud agent desktop. Unlike keyword-based search (which requires agents to guess the exact phrase in the documentation), semantic search understands the meaning of what the agent (or customer) is asking and retrieves the most relevant articles even when the phrasing doesn’t match. When complete, your pipeline will embed live conversation transcripts using Amazon Bedrock’s Titan Embeddings, query an Amazon OpenSearch k-NN (k-Nearest Neighbor) index, and surface the top 3 relevant knowledge base articles in the agent’s panel-ranked by semantic relevance, not keyword occurrence count.


Prerequisites, Roles & Licensing

  • Genesys Cloud: CX 2 or 3 with Agent Assist or custom widget capabilities.
  • Permissions required:
    • Conversations > Conversation > View (for live transcript access)
    • Integrations > Integration > Edit (for widget configuration)
  • Infrastructure:
    • Amazon OpenSearch Service with k-NN plugin enabled.
    • Amazon Bedrock with access to the amazon.titan-embed-text-v2:0 model.
    • A Lambda function as the search API backend.
    • Your knowledge base content (PDFs, Confluence, ServiceNow articles) preprocessed into a JSON corpus.

The Implementation Deep-Dive

1. The Limits of Keyword Search in a Contact Center

A traditional Agent Assist system uses BM25 keyword search. An agent handling a call about a “failed payment” types “payment failed” into the search box, and the system returns articles containing those exact words.

The problem: the customer said “my card keeps bouncing” and the relevant knowledge base article is titled “Resolving Declined Transactions in the Payment Portal.” Keyword search returns no results. Semantic search returns the correct article because it understands that “card bouncing” = “declined transaction.”


2. Building the Knowledge Base Index

Step 1: Chunk and Embed Knowledge Base Articles

Process your knowledge base content into overlapping chunks (512 tokens each, 50-token overlap) and embed each chunk using Bedrock Titan Embeddings.

import boto3
import json
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

BEDROCK = boto3.client('bedrock-runtime', region_name='us-east-1')
REGION = 'us-east-1'
HOST = 'your-opensearch-cluster.us-east-1.es.amazonaws.com'
INDEX_NAME = 'agent-assist-kb'

credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, REGION, 'es', session_token=credentials.token)

os_client = OpenSearch(
    hosts=[{'host': HOST, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

def get_embedding(text: str) -> list[float]:
    """Generate a 1024-dim embedding vector using Amazon Titan."""
    response = BEDROCK.invoke_model(
        modelId='amazon.titan-embed-text-v2:0',
        body=json.dumps({"inputText": text, "dimensions": 1024})
    )
    return json.loads(response['body'].read())['embedding']

def index_article(article_id: str, title: str, content: str, url: str):
    """Embeds and indexes a knowledge base article."""
    
    # Chunk the content
    chunks = chunk_text(content, chunk_size=512, overlap=50)
    
    for i, chunk in enumerate(chunks):
        embedding = get_embedding(f"{title}. {chunk}")
        
        os_client.index(
            index=INDEX_NAME,
            body={
                "article_id": article_id,
                "title": title,
                "chunk_index": i,
                "content": chunk,
                "url": url,
                "embedding": embedding
            },
            id=f"{article_id}_{i}"
        )

Step 2: Create the OpenSearch k-NN Index

def create_index():
    os_client.indices.create(
        index=INDEX_NAME,
        body={
            "settings": {
                "index": {
                    "knn": True,
                    "knn.algo_param.ef_search": 512
                }
            },
            "mappings": {
                "properties": {
                    "embedding": {
                        "type": "knn_vector",
                        "dimension": 1024,
                        "method": {
                            "name": "hnsw",
                            "engine": "nmslib",
                            "space_type": "cosinesimil"
                        }
                    },
                    "content": {"type": "text"},
                    "title": {"type": "text"},
                    "url": {"type": "keyword"}
                }
            }
        }
    )

3. The Real-Time Query Pipeline

The Lambda function receives the current conversation context (last 3 utterances) and returns ranked articles.

def semantic_search(query_text: str, top_k: int = 3) -> list[dict]:
    """
    Performs semantic search against the knowledge base.
    Returns top K most relevant article chunks with source metadata.
    """
    # 1. Embed the query
    query_embedding = get_embedding(query_text)
    
    # 2. k-NN search in OpenSearch
    response = os_client.search(
        index=INDEX_NAME,
        body={
            "size": top_k * 2,  # Fetch more to deduplicate by article
            "query": {
                "knn": {
                    "embedding": {
                        "vector": query_embedding,
                        "k": top_k * 2
                    }
                }
            },
            "_source": ["article_id", "title", "content", "url"]
        }
    )
    
    # 3. Deduplicate by article ID (keep highest-scored chunk per article)
    seen_articles = {}
    for hit in response['hits']['hits']:
        article_id = hit['_source']['article_id']
        if article_id not in seen_articles:
            seen_articles[article_id] = {
                "title": hit['_source']['title'],
                "excerpt": hit['_source']['content'][:200] + "...",
                "url": hit['_source']['url'],
                "score": hit['_score']
            }
    
    return sorted(seen_articles.values(), key=lambda x: x['score'], reverse=True)[:top_k]

4. Integrating with the Agent Desktop

Your Genesys Cloud custom widget calls the Lambda search endpoint every 15 seconds with the last 3 customer utterances as the query context, surfacing results in the agent’s panel.

// Custom Genesys MAX Widget
async function refreshAgentAssist(conversationId) {
    const transcript = await getLastNUtterances(conversationId, 3);
    const queryText = transcript.map(u => u.text).join(". ");
    
    const results = await fetch('/api/agent-assist/search', {
        method: 'POST',
        headers: {'Content-Type': 'application/json'},
        body: JSON.stringify({ query: queryText })
    }).then(r => r.json());
    
    renderArticleCards(results);
}

function renderArticleCards(articles) {
    const panel = document.getElementById('assist-panel');
    panel.innerHTML = articles.map(a => `
        <div class="article-card">
            <a href="${a.url}" target="_blank">${a.title}</a>
            <p>${a.excerpt}</p>
            <span class="relevance-badge">${(a.score * 100).toFixed(0)}% match</span>
        </div>
    `).join('');
}

Validation, Edge Cases & Troubleshooting

Edge Case 1: Stale Index After Knowledge Base Updates

If your knowledge base is updated (articles renamed, deprecated, or rewritten) but the OpenSearch index is not re-indexed, agents will be directed to outdated articles.
Solution: Implement a daily incremental re-indexing job that checks article lastModified timestamps in your CMS and re-embeds only changed documents. For deleted articles, use the OpenSearch Delete by Query API to remove all chunks with the matching article_id.

Edge Case 2: High Embedding API Latency During Spikes

If 500 concurrent agent assist widgets all query the embedding API simultaneously during peak hours, Bedrock’s Titan model throttles requests, increasing latency to 3-5 seconds.
Solution: Cache the embeddings for common queries in ElastiCache. Use a 60-second TTL. “Customer can’t log in” is asked hundreds of times a day-compute the embedding once, cache it, and serve 99% of searches from the cache.

Edge Case 3: Irrelevant Results for Short Queries

A one-word query like “billing” produces noisy results because the embedding of a single word is a poor representation of agent intent.
Solution: Configure the widget to not trigger a search for queries under 4 words. Instead, display a static “Most Searched Articles” panel until the transcript grows to a meaningful length. Better yet, pre-compute embeddings for the top-10 most common intent keywords and short-circuit the search for those.

Official References