When major IT outages hit, it’s often the payment gateways that make the headlines.
Customers stranded at checkouts and bars, queues growing, businesses forced to turn away trade.
For merchants, those moments are more than an inconvenience – they’re a reminder that reliability isn’t an aspiration. It’s a responsibility.
Best picks for you
Nick Fryer
Social Links Navigation
In payments, resilience determines whether businesses keep taking money when the unexpected hits. Yet resilience doesn’t appear by accident.
It’s the product of architectural choices made earlier – decisions about cloud strategy, redundancy and observability.
Those choices decide whether a system bends or breaks under pressure.
Design for failure
Resilient systems assume failure is inevitable. Hardware will degrade, and networks will glitch. The goal isn’t to avoid failure entirely, but to absorb it gracefully – to keep transactions flowing even when components falter.
That starts with a cloud-native architecture, spread across multiple regions and, crucially, multiple cloud providers. Instead of treating the cloud as a single dependency, payments systems should view it as a set of interchangeable parts. When one data center degrades, workloads shift automatically to another with capacity.
Recent Dojo research has found that one in five (20%) hospitality leaders cited payment failures or downtime as a particular concern for their organization, with payment system failures disrupting over half (58%) of businesses weekly.
With such pressure on payment systems and impacts resulting in revenue losses, businesses must ensure they have the IT infrastructure in place so that if one component fails – or even one cloud region – the transaction still succeeds.
Don’t miss these
The customer never notices, and the merchant keeps trading.
Remove single points of failure: go active-active across clouds
Traditional “active-passive” setups – where a backup system lies dormant until something breaks – are too slow for real-time payments. The modern approach is active-active, where live traffic continually flows through multiple environments at once.
By distributing load across two or more clouds, a platform avoids reliance on any single provider. It’s a hedge against correlated risk – the kind that can take down entire supply chains when a shared dependency fails.
This is what underpins 99.99% uptime – not marketing spin, but engineering discipline. Redundancy only matters if it’s active, tested and observable. And provider diversity isn’t just about performance; it’s about isolating risk. Different clouds fail differently. That heterogeneity is a strength.
The paradox of reliability is that it comes from embracing failure. You don’t achieve uptime by assuming perfection, but by assuming imperfection and designing around it.
Resilience to the edge
Infrastructure resilience means little if the terminal can’t talk to it. Payments happen at the edge – in cafés, restaurants and shops, often on unreliable networks. That’s why resilience must extend from the data center to the device.
Payment terminals should use multi-carrier 4G SIMs that automatically select the strongest network. If a merchant’s Wi-Fi drops, the terminal switches to mobile data. If one carrier goes down, another steps in.
Equally important is end-to-end observability. We maintain visibility from the device through to the data center, monitoring for latency spikes or packet loss that might signal an issue. That allows our operations teams to reroute or rebalance before customers notice disruptions.
It’s a reminder that resilience isn’t just a backend concern. For merchants, the edge is the experience. If the terminal works, commerce continues. If it doesn’t, reliability elsewhere is irrelevant.
Reliability as a competitive advantage
The best resilience strategies are invisible when they work. Customers don’t see multi-region replication or active-active routing. They just see payments going through, first time, every time.
Behind that simplicity is a cultural choice. Building for reliability means investing in redundancy that, if all goes well, will rarely need to be used. It means testing failure scenarios in production and empowering engineers to prioritize stability over novelty.
Ultimately, reliability is a matter of trust. When businesses choose a payments provider, they’re not just buying technology – they’re buying assurance that their revenue flow won’t stop. Outages will happen. The question is whether payments pause or proceed.
Resilience isn’t a final layer added to an existing stack. It’s the foundation everything else stands on. Build for failure, remove single points of weakness, extend resilience to the edge – and your systems will stay standing when others fall.
Because in payments, reliability isn’t just technical excellence. It’s business continuity.
Check out our feature on the best merchant services.
This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

