Trouble-to-Resolve Flow
What Is Trouble-to-Resolve?
The Trouble-to-Resolve (T2R) process is the second most critical end-to-end flow in a telco, sitting alongside Lead-to-Cash as a core operational capability. While L2C focuses on selling and delivering services, T2R focuses on maintaining service quality and resolving issues when things go wrong.
T2R encompasses the entire assurance lifecycle: detecting a problem (proactively or via customer report), diagnosing its root cause, resolving it, and confirming resolution with the customer. In eTOM terms, T2R traverses the Assurance vertical across CRM, SM&O, and RM&O horizontal layers.
Proactive vs Reactive Assurance
Assurance processes can be triggered in two fundamentally different ways. Understanding this distinction is critical because it determines the starting point of the T2R flow and which systems initiate the process.
Proactive vs Reactive Assurance
| Aspect | Reactive Assurance | Proactive Assurance |
|---|---|---|
| Trigger | Customer reports a problem | Network/system detects an anomaly |
| Starting Point | Trouble Ticket (CRM layer) | Network Alarm/Event (RM&O layer) |
| Direction | Top-down: customer → service → resource | Bottom-up: resource → service → customer impact |
| Time to Detect | Delayed (customer must notice and report) | Near real-time (automated monitoring) |
| Customer Experience | Customer is already frustrated | Telco may resolve before customer notices |
| Key System | Service Desk / CRM | Fault Management / NMS |
| Maturity Level | Basic capability | Advanced capability (requires correlation) |
Think of two scenarios: (1) A customer calls saying "my internet is down" — that is reactive. (2) The network monitoring system detects a fibre cut and automatically notifies affected customers — that is proactive. Both lead to the same resolution process, but proactive assurance starts earlier and provides a much better customer experience.
In reactive T2R, the flow is: CRM (trouble ticket) → SOM (service impact analysis) → ROM (resource diagnosis). In proactive T2R, the flow inverts: NMS/EMS (alarm) → Fault Management (correlation) → Service Impact Analysis → CRM (proactive notification). The systems are the same, but the direction and sequencing differ.
Modern proactive assurance uses event-driven architecture with stream processing. Network alarms are ingested as events (TMF642 Alarm Management), correlated in real-time to identify root cause, and automatically mapped to impacted services (using Service Inventory topology) and customers. AI/ML models predict degradation before alarms fire. This is the foundation of "autonomous operations" in TM Forum's vision.
The Complete T2R Flow
The Trouble-to-Resolve flow consists of six major stages, whether triggered reactively by a customer or proactively by network monitoring. Each stage has clear system ownership and eTOM process mappings.
Trouble-to-Resolve: End-to-End Flow
Detection / Reporting
NMS / Service Desk / CRMA problem is detected via network alarm (proactive) or reported by the customer via phone, portal, or chat (reactive). Initial information is captured.
Trouble Ticket Creation
Trouble Ticket System / CRMA formal Trouble Ticket is created, linking the issue to the affected customer, service, and (if known) resource. Initial categorisation and priority are assigned.
Diagnosis & Root Cause Analysis
Fault Management / SOM / ROMService impact analysis identifies which CFS instances are affected. Resource analysis identifies the root cause at the network level. May involve automated diagnostics or field dispatch.
Resolution & Restoration
ROM / EMS / Field ServiceThe root cause is addressed: network element repaired/replaced, configuration corrected, or service rerouted. Service is restored to normal operation.
Verification & Testing
Test Management / CRMAutomated and/or manual testing confirms the service is restored to agreed quality levels. Customer may be contacted to confirm resolution from their perspective.
Closure & Reporting
Trouble Ticket System / AnalyticsTrouble ticket is closed with resolution details. Metrics are captured for SLA reporting, trend analysis, and continuous improvement.
Stage 1: Detection and Reporting
The T2R flow begins when an issue is first identified. In the reactive path, this happens when a customer contacts the service desk. In the proactive path, this happens when network monitoring detects an alarm or performance degradation. The key challenge at this stage is capturing enough information to enable efficient diagnosis.
The customer contacts the service desk via phone, chat, email, or self-service portal. The agent identifies the customer, looks up their active services (via Product Inventory / Service Inventory), and captures the symptom description. An initial diagnostic may be run (e.g., line test for broadband).
- Customer identification (CRM lookup)
- Service identification (Product/Service Inventory query)
- Symptom capture and categorisation
- Initial automated diagnostic (if available)
- Known issue matching (check for existing network alarms)
Stage 2: Trouble Ticket Management
The Trouble Ticket is the central tracking entity for the entire T2R process. It links the reported problem to the affected customer, services, and resources. It tracks the lifecycle from creation through diagnosis, resolution, and closure. In TM Forum terms, the Trouble Ticket is defined by TMF621 (Trouble Ticket API).
Trouble Ticket Key Attributes
| Attribute | Description | Source |
|---|---|---|
| Ticket ID | Unique identifier for tracking | Trouble Ticket System |
| Severity / Priority | Impact and urgency classification | Initial assessment + SLA rules |
| Category | Type of issue (connectivity, performance, billing) | Agent/automation categorisation |
| Related Customer | The affected customer account | CRM |
| Related Service | The affected CFS instance(s) | Service Inventory |
| Related Resource | The affected resource(s), if known | Resource Inventory |
| Status | Lifecycle state (open, in progress, resolved, closed) | Trouble Ticket System |
| SLA Target | Resolution deadline based on severity and contract | SLA Management |
Trouble Ticket — Source of Record
| Entity | System of Record | System of Engagement | System of Reference | Notes |
|---|---|---|---|---|
| Trouble Ticket | Trouble Ticket System | Service Desk / Self-Service | — | Central tracking entity for all T2R activity |
| Alarm / Event | Fault Management / NMS | NOC Dashboard | — | Raw alarms correlated into incidents |
| Service Impact | Service Inventory | Service Quality Dashboard | — | Which CFS instances are degraded or down |
| Work Order | Field Service / Workforce Mgmt | Field Technician App | — | Created when physical intervention is needed |
Stage 3: Diagnosis & Root Cause Analysis
Diagnosis is the most technically complex stage of T2R. It requires traversing from the customer-visible symptom down through the service and resource layers to identify the root cause. This traversal relies heavily on the topology information stored in Service Inventory and Resource Inventory.
Diagnostic Flow
Service Impact Analysis
Service Inventory / SOMIdentify which CFS instance(s) are affected by querying Service Inventory. Determine whether the issue is service-wide or customer-specific.
Service-to-Resource Mapping
Service Inventory → Resource InventoryUse the CFS-to-RFS-to-Resource topology to trace from affected services down to supporting resources.
Resource Diagnosis
Resource Inventory / NMS / EMSCheck the status and performance of supporting resources. Look for active alarms, configuration drift, or capacity issues.
Root Cause Identification
Fault Management / NOCCorrelate all findings to identify the single root cause (or multiple contributing factors). Determine the fix required.
Stage 4: Resolution & Service Restoration
Once the root cause is identified, the resolution stage executes the fix. The nature of the resolution depends on the root cause — it may be a remote configuration change, a software patch, a hardware replacement, or a field visit. The goal is always to restore the affected service(s) to their agreed quality levels as quickly as possible.
For issues that can be resolved without physical intervention: configuration rollback, traffic rerouting, capacity rebalancing, or software restart. Remote resolution is fastest and preferred. Systems involved: ROM for resource reconfiguration, EMS for element-level commands, SDN Controller for network path changes.
Stage 5-6: Verification and Closure
After the fix is applied, verification confirms that the service has been restored. This may involve automated service tests (TMF653 Service Test Management), performance metric checks, and customer confirmation. Only after successful verification should the trouble ticket be moved to "resolved" status.
Closure adds the final resolution details, captures metrics (mean time to detect, mean time to resolve), and updates knowledge bases for future diagnosis. In many implementations, the ticket moves to "resolved" first, giving the customer a window to confirm or reopen, before automatically closing after a defined period.
- Apply the fix (remote or field)
- Run automated service test to verify restoration
- Update trouble ticket status to "resolved"
- Notify the customer of resolution
- Wait for customer confirmation window (e.g., 48 hours)
- Auto-close the ticket if no reopening
- Capture MTTD, MTTR, and resolution category for reporting
eTOM Level 2 Process Map for T2R
eTOM Level 2 Processes in Trouble-to-Resolve
| eTOM L2 Process | eTOM Area | T2R Stage | Primary System |
|---|---|---|---|
| Problem Handling (1.2.1.4) | CRM / Assurance | Detection & Ticket Creation | Service Desk / CRM |
| Customer QoS/SLA Management (1.2.1.3) | CRM / Assurance | SLA Tracking | SLA Management |
| Service Problem Management (1.2.2.2) | SM&O / Assurance | Service Diagnosis | SOM / Fault Mgmt |
| Service Quality Management (1.2.2.1) | SM&O / Assurance | Service Monitoring | Service Quality Mgmt |
| Resource Trouble Management (1.2.3.2) | RM&O / Assurance | Resource Diagnosis & Fix | ROM / NMS / EMS |
| Resource Performance Management (1.2.3.1) | RM&O / Assurance | Resource Monitoring | NMS / Performance Mgmt |
| Resource Data Collection & Distribution (1.2.3.4) | RM&O / Assurance | Data Gathering | Mediation / NMS |
TM Forum API Touchpoints for T2R
TMF Open APIs in the T2R Flow
| TMF API | Name | T2R Usage | Key System |
|---|---|---|---|
| TMF621 | Trouble Ticket | Create, update, and track trouble tickets | Trouble Ticket System |
| TMF656 | Service Problem Management | Manage service-level problem records | Fault Management |
| TMF642 | Alarm Management | Ingest and manage network alarms | NMS / Fault Management |
| TMF657 | Work Order Management | Create and track field service work orders | Workforce Management |
| TMF653 | Service Test Management | Run automated service tests for verification | Test Management |
| TMF638 | Service Inventory | Query CFS topology for impact analysis | Service Inventory |
| TMF639 | Resource Inventory | Query resource topology for root cause analysis | Resource Inventory |
| TMF634 | Resource Catalog | Look up resource specifications for diagnosis | Resource Catalog |
SLA Management and Key Metrics
T2R performance is measured by a set of well-defined metrics. These metrics drive SLA compliance, operational improvement, and regulatory reporting. Every T2R implementation must capture them.
Key T2R Metrics
| Metric | Definition | Typical Target | Measurement |
|---|---|---|---|
| MTTD | Mean Time to Detect — time from fault occurrence to detection | < 5 minutes (proactive) | Alarm timestamp vs fault onset |
| MTTR | Mean Time to Resolve — time from detection to service restoration | < 4 hours (P1) | Ticket creation to resolution |
| First Contact Resolution | Percentage of issues resolved on first customer contact | > 70% | Tickets resolved without escalation |
| Repeat Fault Rate | Percentage of tickets reopened or recurring within 30 days | < 5% | Reopened tickets / total tickets |
| SLA Compliance | Percentage of tickets resolved within SLA target | > 95% | Tickets within SLA / total tickets |
The Critical Dependency on Inventory
The T2R process is only as good as the inventory data it relies on. If the Service Inventory does not accurately reflect which CFS instances a customer has, diagnosis will fail. If the Resource Inventory does not accurately reflect the network topology, root cause analysis will be impossible.
You cannot assure what you cannot inventory. If you do not know what services a customer has and what resources support them, your assurance process is guesswork.
Trouble-to-Resolve — Key Points
- T2R is the core assurance flow: Detection → Ticket → Diagnosis → Resolution → Verification → Closure
- It traverses the eTOM Assurance vertical across CRM, SM&O, and RM&O layers
- Proactive assurance (system-detected) is far superior to reactive (customer-reported) for customer experience
- TMF621 (Trouble Ticket) is the central API for assurance process orchestration
- Diagnosis requires traversing service and resource inventory topology — accurate inventory is a prerequisite
- Resolution may involve remote fix, field dispatch (TMF657), or vendor escalation
- Key metrics: MTTD, MTTR, First Contact Resolution, SLA Compliance
- Proactive assurance requires alarm correlation, service impact analysis, and event-driven architecture