Working with multiple PSPs and routing your transactions is a great way to begin optimizing your revenue, but it only addresses one half of building a payments system. There’s still the possibility of revenue loss when your own system isn’t robust enough to handle the many different failure scenarios it’s likely to encounter.
At Spreedly, we’re laser focused on helping our customers build flexible payment systems that can easily adapt to ever-changing business needs. Lately, we’ve been thinking more about revenue optimization, a goal we’ve seen become increasingly common across our customer base. Broadly speaking, this often means relationships with multiple payment service providers (PSPs) and routing your transactions to the payment gateway in which the transaction has the highest chance of succeeding.
Working with multiple PSPs and routing your transactions is a great way to begin optimizing your revenue, but it only addresses one half of building a payments system. Namely working with the larger payments ecosystems and backing financial institutions. There’s still the possibility of revenue loss when your own system isn’t robust enough to handle the many different failure scenarios it’s likely to encounter. Here are a few things to consider on the technical front as you continue building out your payments system.
If you write web based software then working in the context of a distributed system is nothing new. In fact it’s so common that it’s easy to overlook the complexity of HTTP requests to external services and all of the things that can go wrong. Things like network instability, your gateway being down, or your gateway timing out on transaction requests, just to name a few.
Unfortunately the difference between most external APIs and payments APIs is that when things go wrong it can have pretty disastrous consequences. Everything from accidentally maxing out a user’s credit card to losing thousands of dollars in revenue in minutes. Given the possibility of these consequences, it pays (quite literally) to design a system that is resilient to problems that are out of your control and provides you the best chance of your request going through.
At Spreedly we’ve fully embraced a distributed system, building a series of micro services around Kafka. You can read all about it on our engineering blog. There are a couple concepts we’ve seen be particularly important when designing systems resilient to issues, and it turns out many of those same principles carry over when designing payment systems.
A common problem when making HTTP requests is never receiving a response. In many cases you could retry the request to see if it will succeed in a subsequent attempt, however when charging a card it’s not that simple. Retrying a transaction after receiving a timeout could mean that your gateway did receive the request and processed the transaction successfully, but maybe the response was lost en-route back to you. In this example you could be repeatedly charging your customer.
In cases like these, it’s crucial to use idempotency to ensure you’re not maxing out a customer’s credit card with retries. If you’re unfamiliar with idempotency, it means making multiple identical requests has the same effect as making a single request. In the HTTP world we often think of GET and PUT requests as idempotent. You can make them as many times as you want and there are no side effects from each request.
Unfortunately this isn’t something you can implement on your own, and you’ll need to lean on your payment gateway to provide this for you. In most cases, it’s an additional API field you send in your request body, with a unique key sent along with your transaction. As long as you send the same value in the idempotency field, you can make that purchase request as many times as you want without incurring multiple charges on your users credit card. Your gateway knows the actual state of a transaction regardless of whether or not you’ve heard back and can determine whether or not to continue with a charge.
Stripe and Adyen among others are two payment gateways which provide this particular feature.
A key aspect to a distributed system is that much of the work is being done asynchronously. Quite a bit of work can happen as a side effect of processing an HTTP request that does not need to happen within the request/response cycle. For example, as part of processing a new customer order you may want to send a confirmation email. That work can be placed in a queue and processed asynchronously as a subset of work that only happens in the event of a new order. What would it look like if we applied this same style of flow to processing payments?
You might think that charging a card and knowing the result of the transaction is essential before granting access to your product, but it’s worth reevaluating this perspective. It’ll depend a bit on your product- if it’s access to a download then you probably want the transaction to go through before allowing the download. For subscription based digital goods you could grant access prior to confirming a successful payment, placing the transaction in a queue to be processed sooner rather than later. Many transactions will likely succeed meaning your end users will have a snappier, more pleasant experience buying your product. For the transactions that fail due to issues with a customer's account (for example insufficient funds), you can revoke access to your product and force those users to enter a different payment method.
Ultimately the benefit of placing those transactions in a queue enables you to safely determine retry semantics to address transient issues such at network instability or payment gateway availability. You shouldn’t expect your payment gateway to have 100% availability and in those cases, it’s perfectly acceptable to retry the gateway to see if things have recovered. Now that your transaction processing is happening outside of the standard request/response cycle, customers with transactions that would have succeeded anyway can use your product, which provides a much better first experience.
Idempotency is great, but when applied to a robust system of asynchronous retries, you’re able to safely retry transactions until you receive a response that you can reliably follow up on. Through it all you’ll be optimizing revenue by not being susceptible to transient issues, while also providing a seamless payments experience to your customers -- making it as easy as possible for them to pay you for your product.