BSS/OSS Academy
6.315 min read

Trouble-to-Resolve Flow

What Is Trouble-to-Resolve?

The Trouble-to-Resolve (T2R) process is the second most critical end-to-end flow in a telco, sitting alongside Lead-to-Cash as a core operational capability. While L2C focuses on selling and delivering services, T2R focuses on maintaining service quality and resolving issues when things go wrong.

T2R encompasses the entire assurance lifecycle: detecting a problem (proactively or via customer report), diagnosing its root cause, resolving it, and confirming resolution with the customer. In eTOM terms, T2R traverses the Assurance vertical across CRM, SM&O, and RM&O horizontal layers.

Trouble-to-Resolve (T2R)
The end-to-end business process that begins with the detection or reporting of a service issue and ends with confirmed resolution and closure. T2R encompasses trouble ticket management, fault diagnosis, service restoration, and customer communication. It is the primary assurance value chain.
eTOM Alignment
T2R maps to eTOM Operations processes in the Assurance vertical. Key eTOM L2 processes include: Problem Handling (1.2.1.4), Service Problem Management (1.2.2.2), Resource Trouble Management (1.2.3.2), and Service Quality Management (1.2.2.1). The flow crosses CRM → SM&O → RM&O layers as diagnosis deepens.

Proactive vs Reactive Assurance

Assurance processes can be triggered in two fundamentally different ways. Understanding this distinction is critical because it determines the starting point of the T2R flow and which systems initiate the process.

Proactive vs Reactive Assurance

AspectReactive AssuranceProactive Assurance
TriggerCustomer reports a problemNetwork/system detects an anomaly
Starting PointTrouble Ticket (CRM layer)Network Alarm/Event (RM&O layer)
DirectionTop-down: customer → service → resourceBottom-up: resource → service → customer impact
Time to DetectDelayed (customer must notice and report)Near real-time (automated monitoring)
Customer ExperienceCustomer is already frustratedTelco may resolve before customer notices
Key SystemService Desk / CRMFault Management / NMS
Maturity LevelBasic capabilityAdvanced capability (requires correlation)
The Gold Standard: Proactive Resolution
The most mature telcos aim to resolve problems before customers notice them. This requires automated alarm correlation (network alarm → impacted service → impacted customer), proactive communication ("We detected an issue and are working to resolve it"), and root cause analysis that prevents recurrence.

Think of two scenarios: (1) A customer calls saying "my internet is down" — that is reactive. (2) The network monitoring system detects a fibre cut and automatically notifies affected customers — that is proactive. Both lead to the same resolution process, but proactive assurance starts earlier and provides a much better customer experience.

In reactive T2R, the flow is: CRM (trouble ticket) → SOM (service impact analysis) → ROM (resource diagnosis). In proactive T2R, the flow inverts: NMS/EMS (alarm) → Fault Management (correlation) → Service Impact AnalysisCRM (proactive notification). The systems are the same, but the direction and sequencing differ.

Modern proactive assurance uses event-driven architecture with stream processing. Network alarms are ingested as events (TMF642 Alarm Management), correlated in real-time to identify root cause, and automatically mapped to impacted services (using Service Inventory topology) and customers. AI/ML models predict degradation before alarms fire. This is the foundation of "autonomous operations" in TM Forum's vision.

The Complete T2R Flow

The Trouble-to-Resolve flow consists of six major stages, whether triggered reactively by a customer or proactively by network monitoring. Each stage has clear system ownership and eTOM process mappings.

Trouble-to-Resolve: End-to-End Flow

1
Detection / Reporting
NMS / Service Desk / CRM

A problem is detected via network alarm (proactive) or reported by the customer via phone, portal, or chat (reactive). Initial information is captured.

2
Trouble Ticket Creation
Trouble Ticket System / CRM

A formal Trouble Ticket is created, linking the issue to the affected customer, service, and (if known) resource. Initial categorisation and priority are assigned.

3
Diagnosis & Root Cause Analysis
Fault Management / SOM / ROM

Service impact analysis identifies which CFS instances are affected. Resource analysis identifies the root cause at the network level. May involve automated diagnostics or field dispatch.

4
Resolution & Restoration
ROM / EMS / Field Service

The root cause is addressed: network element repaired/replaced, configuration corrected, or service rerouted. Service is restored to normal operation.

5
Verification & Testing
Test Management / CRM

Automated and/or manual testing confirms the service is restored to agreed quality levels. Customer may be contacted to confirm resolution from their perspective.

6
Closure & Reporting
Trouble Ticket System / Analytics

Trouble ticket is closed with resolution details. Metrics are captured for SLA reporting, trend analysis, and continuous improvement.

Stage 1: Detection and Reporting

The T2R flow begins when an issue is first identified. In the reactive path, this happens when a customer contacts the service desk. In the proactive path, this happens when network monitoring detects an alarm or performance degradation. The key challenge at this stage is capturing enough information to enable efficient diagnosis.

The customer contacts the service desk via phone, chat, email, or self-service portal. The agent identifies the customer, looks up their active services (via Product Inventory / Service Inventory), and captures the symptom description. An initial diagnostic may be run (e.g., line test for broadband).

  1. Customer identification (CRM lookup)
  2. Service identification (Product/Service Inventory query)
  3. Symptom capture and categorisation
  4. Initial automated diagnostic (if available)
  5. Known issue matching (check for existing network alarms)

Stage 2: Trouble Ticket Management

The Trouble Ticket is the central tracking entity for the entire T2R process. It links the reported problem to the affected customer, services, and resources. It tracks the lifecycle from creation through diagnosis, resolution, and closure. In TM Forum terms, the Trouble Ticket is defined by TMF621 (Trouble Ticket API).

Trouble Ticket Key Attributes

AttributeDescriptionSource
Ticket IDUnique identifier for trackingTrouble Ticket System
Severity / PriorityImpact and urgency classificationInitial assessment + SLA rules
CategoryType of issue (connectivity, performance, billing)Agent/automation categorisation
Related CustomerThe affected customer accountCRM
Related ServiceThe affected CFS instance(s)Service Inventory
Related ResourceThe affected resource(s), if knownResource Inventory
StatusLifecycle state (open, in progress, resolved, closed)Trouble Ticket System
SLA TargetResolution deadline based on severity and contractSLA Management
TMF621 — Trouble Ticket API
TMF621 defines the standard API for creating, updating, querying, and managing trouble tickets. It supports lifecycle state transitions, notes/attachments, related entities (customer, service, resource), and integration with work order systems. TMF621 is the primary API for assurance process orchestration.

Trouble Ticket — Source of Record

EntitySystem of RecordSystem of EngagementSystem of ReferenceNotes
Trouble TicketTrouble Ticket SystemService Desk / Self-ServiceCentral tracking entity for all T2R activity
Alarm / EventFault Management / NMSNOC DashboardRaw alarms correlated into incidents
Service ImpactService InventoryService Quality DashboardWhich CFS instances are degraded or down
Work OrderField Service / Workforce MgmtField Technician AppCreated when physical intervention is needed

Stage 3: Diagnosis & Root Cause Analysis

Diagnosis is the most technically complex stage of T2R. It requires traversing from the customer-visible symptom down through the service and resource layers to identify the root cause. This traversal relies heavily on the topology information stored in Service Inventory and Resource Inventory.

Diagnostic Flow

1
Service Impact Analysis
Service Inventory / SOM

Identify which CFS instance(s) are affected by querying Service Inventory. Determine whether the issue is service-wide or customer-specific.

2
Service-to-Resource Mapping
Service Inventory → Resource Inventory

Use the CFS-to-RFS-to-Resource topology to trace from affected services down to supporting resources.

3
Resource Diagnosis
Resource Inventory / NMS / EMS

Check the status and performance of supporting resources. Look for active alarms, configuration drift, or capacity issues.

4
Root Cause Identification
Fault Management / NOC

Correlate all findings to identify the single root cause (or multiple contributing factors). Determine the fix required.

Diagnosis Example: Broadband Degradation
A customer reports slow broadband. Service Impact Analysis shows the CFS "Internet Access" is degraded. Tracing to resources reveals two RFS instances: "Broadband Line" (OK) and "IP Routing" (degraded). Further resource diagnosis shows the aggregation router is at 95% CPU utilisation. Root cause: unexpected traffic spike on a shared aggregation link.

Stage 4: Resolution & Service Restoration

Once the root cause is identified, the resolution stage executes the fix. The nature of the resolution depends on the root cause — it may be a remote configuration change, a software patch, a hardware replacement, or a field visit. The goal is always to restore the affected service(s) to their agreed quality levels as quickly as possible.

For issues that can be resolved without physical intervention: configuration rollback, traffic rerouting, capacity rebalancing, or software restart. Remote resolution is fastest and preferred. Systems involved: ROM for resource reconfiguration, EMS for element-level commands, SDN Controller for network path changes.

Stage 5-6: Verification and Closure

After the fix is applied, verification confirms that the service has been restored. This may involve automated service tests (TMF653 Service Test Management), performance metric checks, and customer confirmation. Only after successful verification should the trouble ticket be moved to "resolved" status.

Closure adds the final resolution details, captures metrics (mean time to detect, mean time to resolve), and updates knowledge bases for future diagnosis. In many implementations, the ticket moves to "resolved" first, giving the customer a window to confirm or reopen, before automatically closing after a defined period.

  1. Apply the fix (remote or field)
  2. Run automated service test to verify restoration
  3. Update trouble ticket status to "resolved"
  4. Notify the customer of resolution
  5. Wait for customer confirmation window (e.g., 48 hours)
  6. Auto-close the ticket if no reopening
  7. Capture MTTD, MTTR, and resolution category for reporting

eTOM Level 2 Process Map for T2R

eTOM Level 2 Processes in Trouble-to-Resolve

eTOM L2 ProcesseTOM AreaT2R StagePrimary System
Problem Handling (1.2.1.4)CRM / AssuranceDetection & Ticket CreationService Desk / CRM
Customer QoS/SLA Management (1.2.1.3)CRM / AssuranceSLA TrackingSLA Management
Service Problem Management (1.2.2.2)SM&O / AssuranceService DiagnosisSOM / Fault Mgmt
Service Quality Management (1.2.2.1)SM&O / AssuranceService MonitoringService Quality Mgmt
Resource Trouble Management (1.2.3.2)RM&O / AssuranceResource Diagnosis & FixROM / NMS / EMS
Resource Performance Management (1.2.3.1)RM&O / AssuranceResource MonitoringNMS / Performance Mgmt
Resource Data Collection & Distribution (1.2.3.4)RM&O / AssuranceData GatheringMediation / NMS

TM Forum API Touchpoints for T2R

TMF Open APIs in the T2R Flow

TMF APINameT2R UsageKey System
TMF621Trouble TicketCreate, update, and track trouble ticketsTrouble Ticket System
TMF656Service Problem ManagementManage service-level problem recordsFault Management
TMF642Alarm ManagementIngest and manage network alarmsNMS / Fault Management
TMF657Work Order ManagementCreate and track field service work ordersWorkforce Management
TMF653Service Test ManagementRun automated service tests for verificationTest Management
TMF638Service InventoryQuery CFS topology for impact analysisService Inventory
TMF639Resource InventoryQuery resource topology for root cause analysisResource Inventory
TMF634Resource CatalogLook up resource specifications for diagnosisResource Catalog

SLA Management and Key Metrics

T2R performance is measured by a set of well-defined metrics. These metrics drive SLA compliance, operational improvement, and regulatory reporting. Every T2R implementation must capture them.

Key T2R Metrics

MetricDefinitionTypical TargetMeasurement
MTTDMean Time to Detect — time from fault occurrence to detection< 5 minutes (proactive)Alarm timestamp vs fault onset
MTTRMean Time to Resolve — time from detection to service restoration< 4 hours (P1)Ticket creation to resolution
First Contact ResolutionPercentage of issues resolved on first customer contact> 70%Tickets resolved without escalation
Repeat Fault RatePercentage of tickets reopened or recurring within 30 days< 5%Reopened tickets / total tickets
SLA CompliancePercentage of tickets resolved within SLA target> 95%Tickets within SLA / total tickets
SLA Clocking
SLA clocks are typically paused ("stopped") when the ball is in the customer's court (e.g., awaiting customer access for field visit, awaiting customer information). This is called "clock stop" or "pending customer" status. Properly managing clock stops is essential for accurate SLA reporting.

The Critical Dependency on Inventory

The T2R process is only as good as the inventory data it relies on. If the Service Inventory does not accurately reflect which CFS instances a customer has, diagnosis will fail. If the Resource Inventory does not accurately reflect the network topology, root cause analysis will be impossible.

Inventory Accuracy Is Non-Negotiable
Many T2R failures trace back to inaccurate inventory. If service-to-resource mappings are wrong, the diagnostic flow will chase phantom issues. Invest in inventory reconciliation — the ability to automatically compare inventory records against actual network state — as a foundational capability.

You cannot assure what you cannot inventory. If you do not know what services a customer has and what resources support them, your assurance process is guesswork.

Common telco architecture principle

Trouble-to-Resolve — Key Points

  • T2R is the core assurance flow: Detection → Ticket → Diagnosis → Resolution → Verification → Closure
  • It traverses the eTOM Assurance vertical across CRM, SM&O, and RM&O layers
  • Proactive assurance (system-detected) is far superior to reactive (customer-reported) for customer experience
  • TMF621 (Trouble Ticket) is the central API for assurance process orchestration
  • Diagnosis requires traversing service and resource inventory topology — accurate inventory is a prerequisite
  • Resolution may involve remote fix, field dispatch (TMF657), or vendor escalation
  • Key metrics: MTTD, MTTR, First Contact Resolution, SLA Compliance
  • Proactive assurance requires alarm correlation, service impact analysis, and event-driven architecture