Zero-Downtime Architecture Engineered for Five-Nines Uptime Reliability.สถาปัตยกรรม Zero-Downtime สำหรับ Uptime ระดับ Five-Nines
SCTP and Diameter multihoming, Active/Active dual-node clustering, and hypervisor-based virtualization — eliminating every single point of failure from OCS to signaling gateway.
SCTP และ Diameter Multihoming, Dual-Node Active/Active Clustering และ Hypervisor-Based Virtualization — ขจัด Single Point of Failure ทุกจุดตั้งแต่ OCS จนถึง Signaling Gateway
SCTP and Diameter MultihomingSCTP และ Diameter Multihoming
The most common cause of MVNO service outage is a control-plane link failure — the SS7 or Diameter connection to the host MNO drops and subscribers lose access to services that require real-time authorization: voice calls, data sessions, SMS.
Cipher Telecom eliminates this failure mode through multihoming: every SCTP association and Diameter peer connection is bound to multiple local IP addresses residing on different physical network interfaces, connected to different upstream switches. If a cable fails, a NIC fails, or an upstream switch fails, the SCTP stack detects the path failure in under one second and instantly shifts all traffic to the surviving paths — without tearing down the association and without interrupting any active user session or in-flight billing event.
How SCTP multihoming works
- Each SCTP endpoint advertises 2–4 local IP addresses at association setup.
- The remote peer (host MNO STP) monitors all paths via HEARTBEAT chunks.
- If the primary path becomes unreachable, SCTP promotes the next available path without dropping the association.
- No TCP connection teardown, no BGP reconvergence wait, no session state loss.
- Active user sessions continue without interruption — the failover is transparent to the application layer.
Multihomed SCTP topology
Dual-Node Host Redundancyความซ้ำซ้อนแบบ Dual-Node
Every network element in Cipher Telecom's stack — OCS, PCEF, SMSC, USSD gateway, signaling gateways — is deployed across two redundant virtualized server environments: Node A and Node B, resident in the same data centre but on separate physical hosts, separate power feeds, and separate network uplinks.
The cluster operates in Active/Active mode: both nodes handle live traffic simultaneously. Subscriber session state is synchronously replicated between nodes so that if Node A fails, Node B has a complete and current view of every active session — no sessions are dropped, no billing records are lost.
Cluster topology per network element
| NE | Node A role | Node B role | State sync | Failover time |
|---|---|---|---|---|
| OCS | Active — rating + authorization | Active — rating + authorization | In-memory replication, synchronous | <500ms |
| PCEF | Active — policy enforcement | Active — policy enforcement | Session table sync | <1s |
| SMSC | Active — MO/MT routing | Active — MO/MT routing | Queue replication | <500ms |
| Signaling GW | Active — SCTP/M3UA | Active — SCTP/M3UA | Routing table sync | <1s |
Virtualized Infrastructure Efficiencyประสิทธิภาพของโครงสร้าง Virtualized
Every Cipher Telecom network element runs as a Virtual Machine (VM) or container on a hypervisor-based infrastructure layer. This provides three operational advantages over bare-metal proprietary appliances:
- Elastic scaling — additional VM instances can be provisioned in minutes as subscriber load grows, without hardware procurement lead times.
- Snapshot-based backup — complete system state, including subscriber database, can be snapshotted and replicated off-site. Recovery from a catastrophic failure is a restore operation, not a rebuild.
- Resource isolation — each NE runs in its own isolated VM with dedicated vCPU and memory allocation; a resource spike in one element cannot starve another.
- On-premise or co-location flexibility — the same VM images run on customer-provided data-centre hardware or Cipher-managed co-lo infrastructure.
Security Architectureสถาปัตยกรรมความปลอดภัย
What 99.999% Actually Means99.999% หมายความว่าอะไรจริงๆ
| Availability | Downtime per year | Downtime per month | Typical architecture |
|---|---|---|---|
| 99% | 3.65 days | 7.3 hours | Single server, no redundancy |
| 99.9% | 8.77 hours | 43.8 minutes | Basic hot-standby |
| 99.99% | 52.6 minutes | 4.4 minutes | Active/Standby cluster |
| 99.999% | 5.26 minutes | 26.3 seconds | Active/Active + multihoming (Cipher Telecom) |
Five-nines is an architectural target, not a contractual guarantee on all incidents. Actual SLA terms are defined per engagement in the service schedule. Numbers assume a single-year window.
Frequently Asked Questions
Q.01 What is SCTP multihoming and how is it different from IP failover?
SCTP multihoming is a transport-layer feature where a single SCTP association is bound to multiple IP addresses on both endpoints simultaneously. When a path fails, the SCTP protocol itself detects the failure (via missed HEARTBEAT acknowledgements) and reroutes traffic to a surviving path — all within the same association, without tearing it down and re-establishing it. Traditional IP failover (e.g. VRRP) operates at Layer 3 and takes longer, often requiring sessions to be rebuilt. SCTP multihoming failover happens in under one second and is invisible to the applications (OCS, SMSC) using the association.
Q.02 What is a Single Point of Failure (SPOF) and how does Cipher eliminate them?
A Single Point of Failure is any component in a system whose failure causes the entire system to become unavailable. Common SPOFs in telecom deployments include a single server running the OCS, a single network interface on the signaling gateway, or a single upstream switch. Cipher Telecom eliminates SPOFs by deploying every network element in an Active/Active dual-node cluster, connecting each node via multiple physical network interfaces to separate upstream switches, and using SCTP multihoming for all signaling connections.
Download the High Availability Whitepaper.ดาวน์โหลด High Availability Whitepaper
Architecture diagrams, failure scenario analysis and SLA mapping — ready for your infrastructure review board and compliance sign-off.
แผนสถาปัตยกรรม, การวิเคราะห์สถานการณ์ล้มเหลว และการ Map SLA — พร้อมสำหรับการตรวจสอบโครงสร้างพื้นฐานและการอนุมัติด้านการปฏิบัติตามกฎ