6.315 min read

Trouble-to-Resolve Flow

What Is Trouble-to-Resolve?

The Trouble-to-Resolve (T2R) process is the second most critical end-to-end flow in a telco, sitting alongside Lead-to-Cash as a core operational capability. While L2C focuses on selling and delivering services, T2R focuses on maintaining service quality and resolving issues when things go wrong.

T2R encompasses the entire assurance lifecycle: detecting a problem (proactively or via customer report), diagnosing its root cause, resolving it, and confirming resolution with the customer. In eTOM terms, T2R traverses the Assurance vertical across CRM, SM&O, and RM&O horizontal layers.

Trouble-to-Resolve (T2R)

The end-to-end business process that begins with the detection or reporting of a service issue and ends with confirmed resolution and closure. T2R encompasses trouble ticket management, fault diagnosis, service restoration, and customer communication. It is the primary assurance value chain.

eTOM Alignment

T2R maps to eTOM Operations processes in the Assurance vertical. Key eTOM L2 processes include: Problem Handling (1.2.1.4), Service Problem Management (1.2.2.2), Resource Trouble Management (1.2.3.2), and Service Quality Management (1.2.2.1). The flow crosses CRM → SM&O → RM&O layers as diagnosis deepens.

Proactive vs Reactive Assurance

Assurance processes can be triggered in two fundamentally different ways. Understanding this distinction is critical because it determines the starting point of the T2R flow and which systems initiate the process.

Proactive vs Reactive Assurance

Aspect	Reactive Assurance	Proactive Assurance
Trigger	Customer reports a problem	Network/system detects an anomaly
Starting Point	Trouble Ticket (CRM layer)	Network Alarm/Event (RM&O layer)
Direction	Top-down: customer → service → resource	Bottom-up: resource → service → customer impact
Time to Detect	Delayed (customer must notice and report)	Near real-time (automated monitoring)
Customer Experience	Customer is already frustrated	Telco may resolve before customer notices
Key System	Service Desk / CRM	Fault Management / NMS
Maturity Level	Basic capability	Advanced capability (requires correlation)

The Gold Standard: Proactive Resolution

The most mature telcos aim to resolve problems before customers notice them. This requires automated alarm correlation (network alarm → impacted service → impacted customer), proactive communication ("We detected an issue and are working to resolve it"), and root cause analysis that prevents recurrence.

Think of two scenarios: (1) A customer calls saying "my internet is down" — that is reactive. (2) The network monitoring system detects a fibre cut and automatically notifies affected customers — that is proactive. Both lead to the same resolution process, but proactive assurance starts earlier and provides a much better customer experience.

In reactive T2R, the flow is: CRM (trouble ticket) → SOM (service impact analysis) → ROM (resource diagnosis). In proactive T2R, the flow inverts: NMS/EMS (alarm) → Fault Management (correlation) → Service Impact Analysis → CRM (proactive notification). The systems are the same, but the direction and sequencing differ.

Modern proactive assurance uses event-driven architecture with stream processing. Network alarms are ingested as events (TMF642 Alarm Management), correlated in real-time to identify root cause, and automatically mapped to impacted services (using Service Inventory topology) and customers. AI/ML models predict degradation before alarms fire. This is the foundation of "autonomous operations" in TM Forum's vision.

The Complete T2R Flow

The Trouble-to-Resolve flow consists of six major stages, whether triggered reactively by a customer or proactively by network monitoring. Each stage has clear system ownership and eTOM process mappings.

Trouble-to-Resolve: End-to-End Flow

Detection / Reporting

NMS / Service Desk / CRM

A problem is detected via network alarm (proactive) or reported by the customer via phone, portal, or chat (reactive). Initial information is captured.

Trouble Ticket Creation

Trouble Ticket System / CRM

A formal Trouble Ticket is created, linking the issue to the affected customer, service, and (if known) resource. Initial categorisation and priority are assigned.

Diagnosis & Root Cause Analysis

Fault Management / SOM / ROM

Service impact analysis identifies which CFS instances are affected. Resource analysis identifies the root cause at the network level. May involve automated diagnostics or field dispatch.

Resolution & Restoration

ROM / EMS / Field Service

The root cause is addressed: network element repaired/replaced, configuration corrected, or service rerouted. Service is restored to normal operation.

Verification & Testing

Test Management / CRM

Automated and/or manual testing confirms the service is restored to agreed quality levels. Customer may be contacted to confirm resolution from their perspective.

Closure & Reporting

Trouble Ticket System / Analytics

Trouble ticket is closed with resolution details. Metrics are captured for SLA reporting, trend analysis, and continuous improvement.

Stage 1: Detection and Reporting

The T2R flow begins when an issue is first identified. In the reactive path, this happens when a customer contacts the service desk. In the proactive path, this happens when network monitoring detects an alarm or performance degradation. The key challenge at this stage is capturing enough information to enable efficient diagnosis.

The customer contacts the service desk via phone, chat, email, or self-service portal. The agent identifies the customer, looks up their active services (via Product Inventory / Service Inventory), and captures the symptom description. An initial diagnostic may be run (e.g., line test for broadband).

Customer identification (CRM lookup)
Service identification (Product/Service Inventory query)
Symptom capture and categorisation
Initial automated diagnostic (if available)
Known issue matching (check for existing network alarms)

Stage 2: Trouble Ticket Management

The Trouble Ticket is the central tracking entity for the entire T2R process. It links the reported problem to the affected customer, services, and resources. It tracks the lifecycle from creation through diagnosis, resolution, and closure. In TM Forum terms, the Trouble Ticket is defined by TMF621 (Trouble Ticket API).

Trouble Ticket Key Attributes

Attribute	Description	Source
Ticket ID	Unique identifier for tracking	Trouble Ticket System
Severity / Priority	Impact and urgency classification	Initial assessment + SLA rules
Category	Type of issue (connectivity, performance, billing)	Agent/automation categorisation
Related Customer	The affected customer account	CRM
Related Service	The affected CFS instance(s)	Service Inventory
Related Resource	The affected resource(s), if known	Resource Inventory
Status	Lifecycle state (open, in progress, resolved, closed)	Trouble Ticket System
SLA Target	Resolution deadline based on severity and contract	SLA Management

TMF621 — Trouble Ticket API

TMF621 defines the standard API for creating, updating, querying, and managing trouble tickets. It supports lifecycle state transitions, notes/attachments, related entities (customer, service, resource), and integration with work order systems. TMF621 is the primary API for assurance process orchestration.

Trouble Ticket — Source of Record

Entity	System of Record	System of Engagement	System of Reference	Notes
Trouble Ticket	Trouble Ticket System	Service Desk / Self-Service	—	Central tracking entity for all T2R activity
Alarm / Event	Fault Management / NMS	NOC Dashboard	—	Raw alarms correlated into incidents
Service Impact	Service Inventory	Service Quality Dashboard	—	Which CFS instances are degraded or down
Work Order	Field Service / Workforce Mgmt	Field Technician App	—	Created when physical intervention is needed

Stage 3: Diagnosis & Root Cause Analysis

Diagnosis is the most technically complex stage of T2R. It requires traversing from the customer-visible symptom down through the service and resource layers to identify the root cause. This traversal relies heavily on the topology information stored in Service Inventory and Resource Inventory.

Diagnostic Flow

Service Impact Analysis

Service Inventory / SOM

Identify which CFS instance(s) are affected by querying Service Inventory. Determine whether the issue is service-wide or customer-specific.

Service-to-Resource Mapping

Service Inventory → Resource Inventory

Use the CFS-to-RFS-to-Resource topology to trace from affected services down to supporting resources.

Resource Diagnosis

Resource Inventory / NMS / EMS

Check the status and performance of supporting resources. Look for active alarms, configuration drift, or capacity issues.

Root Cause Identification

Fault Management / NOC

Correlate all findings to identify the single root cause (or multiple contributing factors). Determine the fix required.

Diagnosis Example: Broadband Degradation

A customer reports slow broadband. Service Impact Analysis shows the CFS "Internet Access" is degraded. Tracing to resources reveals two RFS instances: "Broadband Line" (OK) and "IP Routing" (degraded). Further resource diagnosis shows the aggregation router is at 95% CPU utilisation. Root cause: unexpected traffic spike on a shared aggregation link.

Stage 4: Resolution & Service Restoration

Once the root cause is identified, the resolution stage executes the fix. The nature of the resolution depends on the root cause — it may be a remote configuration change, a software patch, a hardware replacement, or a field visit. The goal is always to restore the affected service(s) to their agreed quality levels as quickly as possible.

For issues that can be resolved without physical intervention: configuration rollback, traffic rerouting, capacity rebalancing, or software restart. Remote resolution is fastest and preferred. Systems involved: ROM for resource reconfiguration, EMS for element-level commands, SDN Controller for network path changes.

Stage 5-6: Verification and Closure

After the fix is applied, verification confirms that the service has been restored. This may involve automated service tests (TMF653 Service Test Management), performance metric checks, and customer confirmation. Only after successful verification should the trouble ticket be moved to "resolved" status.

Closure adds the final resolution details, captures metrics (mean time to detect, mean time to resolve), and updates knowledge bases for future diagnosis. In many implementations, the ticket moves to "resolved" first, giving the customer a window to confirm or reopen, before automatically closing after a defined period.

Apply the fix (remote or field)
Run automated service test to verify restoration
Update trouble ticket status to "resolved"
Notify the customer of resolution
Wait for customer confirmation window (e.g., 48 hours)
Auto-close the ticket if no reopening
Capture MTTD, MTTR, and resolution category for reporting

eTOM Level 2 Process Map for T2R

eTOM Level 2 Processes in Trouble-to-Resolve

eTOM L2 Process	eTOM Area	T2R Stage	Primary System
Problem Handling (1.2.1.4)	CRM / Assurance	Detection & Ticket Creation	Service Desk / CRM
Customer QoS/SLA Management (1.2.1.3)	CRM / Assurance	SLA Tracking	SLA Management
Service Problem Management (1.2.2.2)	SM&O / Assurance	Service Diagnosis	SOM / Fault Mgmt
Service Quality Management (1.2.2.1)	SM&O / Assurance	Service Monitoring	Service Quality Mgmt
Resource Trouble Management (1.2.3.2)	RM&O / Assurance	Resource Diagnosis & Fix	ROM / NMS / EMS
Resource Performance Management (1.2.3.1)	RM&O / Assurance	Resource Monitoring	NMS / Performance Mgmt
Resource Data Collection & Distribution (1.2.3.4)	RM&O / Assurance	Data Gathering	Mediation / NMS

TM Forum API Touchpoints for T2R

TMF Open APIs in the T2R Flow

TMF API	Name	T2R Usage	Key System
TMF621	Trouble Ticket	Create, update, and track trouble tickets	Trouble Ticket System
TMF656	Service Problem Management	Manage service-level problem records	Fault Management
TMF642	Alarm Management	Ingest and manage network alarms	NMS / Fault Management
TMF657	Work Order Management	Create and track field service work orders	Workforce Management
TMF653	Service Test Management	Run automated service tests for verification	Test Management
TMF638	Service Inventory	Query CFS topology for impact analysis	Service Inventory
TMF639	Resource Inventory	Query resource topology for root cause analysis	Resource Inventory
TMF634	Resource Catalog	Look up resource specifications for diagnosis	Resource Catalog

SLA Management and Key Metrics

T2R performance is measured by a set of well-defined metrics. These metrics drive SLA compliance, operational improvement, and regulatory reporting. Every T2R implementation must capture them.

Key T2R Metrics

Metric	Definition	Typical Target	Measurement
MTTD	Mean Time to Detect — time from fault occurrence to detection	< 5 minutes (proactive)	Alarm timestamp vs fault onset
MTTR	Mean Time to Resolve — time from detection to service restoration	< 4 hours (P1)	Ticket creation to resolution
First Contact Resolution	Percentage of issues resolved on first customer contact	> 70%	Tickets resolved without escalation
Repeat Fault Rate	Percentage of tickets reopened or recurring within 30 days	< 5%	Reopened tickets / total tickets
SLA Compliance	Percentage of tickets resolved within SLA target	> 95%	Tickets within SLA / total tickets

SLA Clocking

SLA clocks are typically paused ("stopped") when the ball is in the customer's court (e.g., awaiting customer access for field visit, awaiting customer information). This is called "clock stop" or "pending customer" status. Properly managing clock stops is essential for accurate SLA reporting.

The Critical Dependency on Inventory

The T2R process is only as good as the inventory data it relies on. If the Service Inventory does not accurately reflect which CFS instances a customer has, diagnosis will fail. If the Resource Inventory does not accurately reflect the network topology, root cause analysis will be impossible.

Inventory Accuracy Is Non-Negotiable

Many T2R failures trace back to inaccurate inventory. If service-to-resource mappings are wrong, the diagnostic flow will chase phantom issues. Invest in inventory reconciliation — the ability to automatically compare inventory records against actual network state — as a foundational capability.

You cannot assure what you cannot inventory. If you do not know what services a customer has and what resources support them, your assurance process is guesswork.
— Common telco architecture principle

Trouble-to-Resolve — Key Points

T2R is the core assurance flow: Detection → Ticket → Diagnosis → Resolution → Verification → Closure
It traverses the eTOM Assurance vertical across CRM, SM&O, and RM&O layers
Proactive assurance (system-detected) is far superior to reactive (customer-reported) for customer experience
TMF621 (Trouble Ticket) is the central API for assurance process orchestration
Diagnosis requires traversing service and resource inventory topology — accurate inventory is a prerequisite
Resolution may involve remote fix, field dispatch (TMF657), or vendor escalation
Key metrics: MTTD, MTTR, First Contact Resolution, SLA Compliance
Proactive assurance requires alarm correlation, service impact analysis, and event-driven architecture