Designing payment webhook

10560 2019-08-19 21:15

1. Clarifying Requirements

  1. Webhook to callback the merchant once the payment succeeds.
  2. Analytics & metrics.
  3. High availability & Failure-resilience.
    1. Async design. Assuming that the servers of merchants are located across the world, and may have a very high latency like 15s.
    2. At-least-once delivery.
    3. Robust & predicable retry.
  4. Security: informing the merchants whether a payment succeeds involves real money real transactions, and thus, security is always a concern.

2. Sketch out the high-level design

async design + retry + queuing + time-series DB + security

Merchants over Internet
Merchants over Internet
subscribe events
subscribe events
get webhook URI, secret, and settings
get webhook URI, secret, and settings
webhook
gateway
webhook<br>gateway
Time-series DB
Time-series DB
publish events
publish events
payment
state machine
payment<br>state machine
user settings
user settings
Dashboard
Dashboard
Event Queue
Event Queue

3. Features and Components

Webhook Gateway

  1. Subscribe to the event queue for payment success events published by a payment state machine or other services.
  2. Once accept an event, fetch webhook URI, secret, and settings from the user settings service. Prepare the request based on those settings.
  3. Make an HTTP POST request to the external merchant’s endpoints with event payload and security headers.

API Definition

// POST https://example.com/webhook/
{
    "id": 1,
    "scheduled_for": "2017-01-31T20:50:02Z",
    "event": {
        "id": "24934862-d980-46cb-9402-43c81b0cdba6",
        "resource": "event",
        "type": "charge:created",
        "api_version": "2018-03-22",
        "created_at": "2017-01-31T20:49:02Z",
        "data": {
          "code": "66BEOV2A", // or order ID the user need to fulfill
          "name": "The Sovereign Individual",
          "description": "Mastering the Transition to the Information Age",
          "hosted_url": "https://commerce.coinbase.com/charges/66BEOV2A",
          "created_at": "2017-01-31T20:49:02Z",
          "expires_at": "2017-01-31T21:49:02Z",
          "metadata": {},
          "pricing_type": "CNY",
          "payments": [
            // ...
          ],
          "addresses": {
            // ...
          }
        }
    }
}

The merchant server should respond with a 200 HTTP status code to acknowledge receipt of a webhook.

If there is no acknowledgment of receipt, we will retry with exponential backoff for up to three days. The maximum retry interval is 1 hour.

Security

  • All webhooks from user settings must be in https
  • All callback requests are with header x-webhook-signature SHA256 HMAC signature. Its value is HMAC(webhook secret, raw request payload);. We generate the secret for the developer to use.

Background Knowledge: HMAC (message authentication code). A short piece of information used to authenticate a message — In other words, to confirm that the message came from the stated sender (its authenticity) and has not been changed in transit (its integrity). The integrity can be verified by the shared secret between trusted parties against the digest of the original message.

Metrics

The webhook gateway service emits statuses into the time-series DB for metrics.

Using Influx DB vs. Prometheus?

  • InfluxDB: Application pushes data to InfluxDB. It has a monolithic DB for metrics and indices.
  • Prometheus: Prometheus server pulls the metrics values from the running application periodically. It uses LevelDB for indices, but each metric is stored in its own file.

I will probably choose InfluxDB for easier maintenance of the monolithic data store.

Depending on how much further data aggregation we need, we can build more advanced data pipeline. However, for just counting success/ failures, a simple time-series DB solves the problem.

© 2010-2018 Tian
Built with in San Francisco