Implementing Prometheus and Grafana Monitoring for BYOC Premise SBC Health Metrics
What This Guide Covers
You are building a production observability stack that collects real-time health metrics from your on-premise Session Border Controller (SBC) - the critical network boundary device that bridges your corporate telephony infrastructure to Genesys Cloud BYOC (Bring Your Own Carrier) trunks - and visualizes them in Grafana dashboards with automated alerting. When complete, your operations team will have live visibility into SBC registration status, active call counts, SIP trunk utilization, packet loss, MOS (Mean Opinion Score) per-trunk, and CPU/memory on the SBC appliance itself. Alerts will fire in PagerDuty before service degradation impacts agents, rather than after customers start reporting audio quality issues.
Prerequisites, Roles & Licensing
- Genesys Cloud: BYOC Premise license with an on-premise Edge server or third-party SBC (AudioCodes, Ribbon/GENBAND, Oracle ACME Packet, Cisco CUBE).
- Infrastructure:
- Prometheus 2.x (self-hosted or managed via Grafana Cloud)
- Grafana 10.x
- SBC SNMP v2c or v3 enabled, or SBC REST API access for metric export
- A Linux metrics collector host (t3.small) with network access to the SBC management interface
The Implementation Deep-Dive
1. SBC Metric Collection Architecture
[AudioCodes SBC] ──SNMP──▶ [SNMP Exporter] ──HTTP──▶ [Prometheus]
[Ribbon SBC] ──REST──▶ [Custom Exporter] │
[ACME Packet] ──SNMP──▶ [SNMP Exporter] │
▼
[Genesys Cloud API] ──▶ [Custom Exporter] ──HTTP──▶ [Prometheus]
│
▼
[Grafana]
[PagerDuty]
Key SBC metrics to collect:
| Metric | Source | Alert Threshold |
|---|---|---|
| Active calls per trunk | SNMP / REST | > 90% trunk capacity |
| SIP registration status | SNMP | 0 = CRITICAL |
| Packet loss % | SNMP (RTP stats) | > 1% = WARNING, > 3% = CRITICAL |
| Jitter (ms) | SNMP (RTP stats) | > 30ms = WARNING |
| MOS score | REST API (if available) | < 3.5 = WARNING, < 3.0 = CRITICAL |
| SBC CPU utilization | SNMP | > 80% = WARNING |
| SBC memory utilization | SNMP | > 85% = WARNING |
| Trunk group call attempts | SNMP | Baseline + 3σ = ANOMALY |
2. SNMP Exporter Configuration for AudioCodes SBC
# /etc/prometheus/snmp_audiocodes.yml
modules:
audiocodes_sbc:
walk:
# Active calls per trunk group
- 1.3.6.1.4.1.5003.9.10.10.1.2 # acSBCTrunkGroupStatCurrentCallsNum
# SIP registration status
- 1.3.6.1.4.1.5003.9.10.10.1.3 # acSBCTrunkGroupStatStatus
# Packet loss
- 1.3.6.1.4.1.5003.9.10.10.2.1.7 # acSBCCallMediaIPGroupRTPLossRate
# Jitter
- 1.3.6.1.4.1.5003.9.10.10.2.1.9 # acSBCCallMediaIPGroupRTPJitter
# CPU utilization
- 1.3.6.1.4.1.5003.9.10.10.1.28 # acSBCCPUUtilization
metrics:
- name: sbc_trunk_active_calls
oid: 1.3.6.1.4.1.5003.9.10.10.1.2
type: gauge
help: "Current number of active calls on trunk group"
indexes:
- labelname: trunk_group
type: gauge
- name: sbc_trunk_registration_status
oid: 1.3.6.1.4.1.5003.9.10.10.1.3
type: gauge
help: "SIP trunk registration status (1=registered, 0=unregistered)"
- name: sbc_rtp_packet_loss_rate
oid: 1.3.6.1.4.1.5003.9.10.10.2.1.7
type: gauge
help: "RTP packet loss rate percentage"
- name: sbc_cpu_utilization_percent
oid: 1.3.6.1.4.1.5003.9.10.10.1.28
type: gauge
help: "SBC CPU utilization percentage"
version: 2
community: your-snmp-community-string
timeout: 10s
retries: 3
3. Custom Genesys Cloud Edge / Trunk Metrics Exporter
Supplement SNMP with Genesys Cloud API data for end-to-end correlated visibility:
#!/usr/bin/env python3
"""
genesys_byoc_exporter.py - Prometheus exporter for BYOC trunk health
Runs on port 9091 as a Prometheus target
"""
from prometheus_client import Gauge, start_http_server
import requests, time, os
GENESYS_API = "https://api.mypurecloud.com"
POLL_INTERVAL = 30 # seconds
# Define Prometheus gauges
trunk_active_calls = Gauge('genesys_trunk_active_calls', 'Active calls on trunk', ['trunk_id', 'trunk_name'])
trunk_status = Gauge('genesys_trunk_status', 'Trunk status (1=active)', ['trunk_id', 'trunk_name'])
edge_status = Gauge('genesys_edge_status', 'Edge status (1=online)', ['edge_id', 'edge_name'])
edge_calls_in_progress = Gauge('genesys_edge_calls_in_progress', 'Calls on edge', ['edge_id', 'edge_name'])
def get_token() -> str:
resp = requests.post(
"https://login.mypurecloud.com/oauth/token",
data={"grant_type": "client_credentials"},
auth=(os.environ["GC_CLIENT_ID"], os.environ["GC_CLIENT_SECRET"])
)
return resp.json()["access_token"]
def collect_metrics(token: str):
headers = {"Authorization": f"Bearer {token}"}
# --- BYOC Trunks ---
trunks = requests.get(f"{GENESYS_API}/api/v2/telephony/providers/edges/trunks",
headers=headers, params={"pageSize": 100}).json()
for trunk in trunks.get("entities", []):
tid = trunk["id"]
name = trunk.get("name", tid)
is_active = 1 if trunk.get("trunkType") == "REGISTERED" else 0
trunk_status.labels(trunk_id=tid, trunk_name=name).set(is_active)
# Active calls pulled from SNMP; here we just set registration status
# --- Edge Servers ---
edges = requests.get(f"{GENESYS_API}/api/v2/telephony/providers/edges",
headers=headers, params={"pageSize": 100}).json()
for edge in edges.get("entities", []):
eid = edge["id"]
name = edge.get("name", eid)
online = 1 if edge.get("onlineStatus") == "ONLINE" else 0
calls = edge.get("callDrainingState", {}).get("draining", False)
edge_status.labels(edge_id=eid, edge_name=name).set(online)
edge_calls_in_progress.labels(edge_id=eid, edge_name=name).set(0 if calls else 1)
def main():
start_http_server(9091)
print("Genesys BYOC Exporter listening on :9091")
token = get_token()
token_refresh_at = time.time() + 1700 # Refresh before 30-min expiry
while True:
if time.time() > token_refresh_at:
token = get_token()
token_refresh_at = time.time() + 1700
collect_metrics(token)
time.sleep(POLL_INTERVAL)
if __name__ == "__main__":
main()
4. Prometheus Scrape Configuration
# prometheus.yml - append to scrape_configs
scrape_configs:
- job_name: 'sbc_snmp_audiocodes'
static_configs:
- targets:
- '192.168.10.20' # Primary SBC management IP
- '192.168.10.21' # Secondary SBC (HA pair)
metrics_path: /snmp
params:
module: [audiocodes_sbc]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9116 # SNMP Exporter address
- job_name: 'genesys_byoc'
static_configs:
- targets: ['localhost:9091']
scrape_interval: 30s
5. Grafana Alert Rules
# grafana/alerts/sbc_health.yml
groups:
- name: SBC Health
rules:
- alert: SBCTrunkUnregistered
expr: sbc_trunk_registration_status == 0
for: 1m
labels:
severity: critical
team: telephony
annotations:
summary: "SBC Trunk {{ $labels.trunk_group }} is UNREGISTERED"
description: "Trunk has been down for >1 minute. Calls may be failing."
runbook: "https://wiki.internal/runbooks/sbc-trunk-recovery"
- alert: SBCHighPacketLoss
expr: sbc_rtp_packet_loss_rate > 3
for: 5m
labels:
severity: warning
annotations:
summary: "SBC packet loss {{ $value }}% on {{ $labels.instance }}"
- alert: GenesysEdgeOffline
expr: genesys_edge_status == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Genesys Edge {{ $labels.edge_name }} is OFFLINE"
Validation, Edge Cases & Troubleshooting
Edge Case 1: SNMP Walk Returns No Data for Some OIDs
Your SBC firmware version uses a different OID branch for trunk stats than the exporter config.
Solution: Use snmpwalk -v2c -c your-community 192.168.10.20 1.3.6.1.4.1.5003 to discover the actual OID tree for your firmware version. AudioCodes MIBS are downloadable from their support portal and importable into MIB browsers (iReasoning, Net-SNMP) for visual OID discovery.
Edge Case 2: Exporter Token Expires During Collection
The Python exporter’s 30-minute OAuth token expires mid-collection cycle, causing a burst of 401 errors logged by Prometheus.
Solution: The exporter already handles this with token_refresh_at logic that refreshes at 28 minutes. If the Genesys Cloud token endpoint is temporarily slow, add a try/except around get_token() with a 60-second retry. Do not let a token refresh failure crash the exporter - continue serving stale metrics with a genesys_exporter_token_healthy gauge set to 0.
Edge Case 3: Grafana Shows Gaps When SBC Has Maintenance Window
Planned SBC maintenance causes alert noise. On-call engineers get woken up for expected downtime.
Solution: Use Grafana’s Silence feature (or Alertmanager’s silence API) to suppress alerts during maintenance windows. Automate silence creation via the Grafana API from your change management system: when a maintenance ticket is opened, automatically create a 2-hour silence for job="sbc_snmp_audiocodes".