Usage-based pricing with Timeplus and Paigo

How Streaming SQL powers our pay-as-you-go cloud offering

The billing system is critical for running a business. But it can take a lot of work to get it done right, especially if your company operates a usage-based pricing model. Payment vendors, like Stripe or Paddle, usually only support single usage dimension or limited aggregation options.

As a startup, we want to 100% focus on what we are good at, which is building a cutting-edge streaming platform. That’s why we chose a tailored billing platform, Paigo, to handle the metering&billing infrastructure.

Paigo is a feature-rich billing automation platform for usage-based pricing, which means it generates bills based on usage data. This is very convenient for setting up "pay as you go" plans, which is exactly what we want at Timeplus. So, the question becomes: how do we send the usage data to Paigo? According to Paigo's documentation, it supports different ways for collecting usage data.

The infrastructure-based and agent-based methods could come in handy when you do not want to do much coding. Paigo has already done the hard part for collecting data for these two methods. Users just need to follow their manual to set it up, and data will start flowing in a minute.

However, these two methods require some administrative work like setting up AWS IAM roles, tagging resources, running another agent, etc. Also, if one wants to do customization and have more control, it will involve quite some configuration work.

At Timeplus, we would like to have more control on what and how data are sent to the billing platform, and we had already had a data collection system set up in our infrastructure, we didn't want to run another one just for sending billing data. And here is where the API-based method can help.

Paigo provides a usage record ingestion HTTP API that allows users to send usage data to Paigo directly. This gives us exactly the flexibility we want. However, building a reliable solution to ingest billing metrics is easier said than done.

A reliable solution should have strong data delivery guarantee, which means:

The application must properly handle errors, retries, checkpoints, back pressure, etc.
Infrastructure support is required to make sure the application will keep running and automatically recover after interruption.
The application is monitored in case something bad happens.

Each of these items takes lots of effort to build. Getting them all done together is not a simple job. Especially for a startup, pulling resources away from the core product on such a time-consuming task is less desired.

Luckily, this is not the case for Timeplus, because building such streaming pipeline is exactly what Timeplus is built for.

Timeplus is a real-time streaming platform that not only allows people to do real-time analysis on streaming data, but it also supports sources and sinks to read/write data from/to another system. What is more important is that Timeplus provides a strong data delivery guarantee on sinks. All these make Timeplus and Timeplus Cloud itself a perfect platform for collecting and sending billing metrics to Paigo.

For the data collection part, thanks to the data ingestion HTTP API, it is easy to use a data collection agent, like benthos, fluentd, vector, etc. to send data to Timeplus. In fact, since the early days, we have already set up vector to collect all the logs and metrics from our Kubernetes clusters and send them to Timeplus. So, we already have all the data we need. Gang, our CTO, wrote a blog on how we do observability at Timeplus including this data collection part. So if you are interested in this topic, that will be a good read.

And for sending data, here is where sinks can help. Timeplus supports quite a few sink types, and the webhook sink is the one which is used to send data over HTTP endpoints. So what we need to do is just to set up a few webhook sinks, depending on how many metrics we want to send. We want each metric to be isolated from the others, so that they won't impact each other if things go wrong! The nice part of it is that, because Timeplus is SQL-based, creating a sink is mostly just writing a SQL query (plus some configurations for the HTTP endpoint).

Here is the high level overview:

Let’s talk about the details, starting from Infrastructure-as-Code.

The resources in this setup is defined with the Terraform language. The benefit it brings to us is that, because we have developed the Terraform provider for Timeplus. It will be very easy for you developers to use the code in this blog to create the same resources in your Timeplus workspace, even without knowing how to use the Timeplus HTTP API. Also, the syntax is human-friendly, so they are easy to read even if you have zero experience with Terraform.

Now, let's move to the example, I will show how one of the usage data sinks is implemented.

One of the metrics we want to use for billing is CPU usage.

Before I demonstrating the sink, I will show you the streams which are used by the sink first.

There are 3 streams we use. Firstly, we use an append-only stream to store metrics collected from the Kubernetes cluster:

resource "timeplus_stream" "k8s_metrics" {
  name = "k8s_metrics"

  column {
    name = "name"
    type = "string"
  }

  column {
    name = "type"
    type = "string"
  }

  column {
    name = "value"
    type = "float64"
  }

  column {
    name = "tags"
    type = "map(string, string)"
  }
}

An example of a metric record looks like this:

name	type	value	tags	_tp_time
container_cpu_usage_seconds_total	counter	4988.516403318	{'cpu':'total','namespace':'tp-tenant-abcdefgh', ‘container’: ‘proton’}	2023-09-11 15:52:51.980

And we have two versioned streams:

resource "timeplus_stream" "paigo_dimensions" {
  name = "paigo_dimensions"

  description = "A versioned stream for storing the Paigo dimension IDs. This is used as a lookup table to find out which Paigo dimension ID should be used for a specific workspace and metric pair."

  mode = "versioned_kv"

  column {
    name        = "offering_name"
    type        = "string"
    primary_key = true
  }

  column {
    name        = "dimension_internal_name"
    type        = "string"
    primary_key = true
  }

  column {
    name = "dimension_id"
    type = "string"
  }
}

resource "timeplus_stream" "workspace_plans" {
  name = "workspace_plans"

  description = "A versioned stream for storing workspace subscription plan information. This is used as a lookup table to find out the Paigo customer ID and plan name for a workspace."

  mode = "versioned_kv"

  column {
    name        = "workspace_id"
    type        = "string"
    primary_key = true
  }

  column {
    name = "plan_name"
    type = "string"
  }

  column {
    name = "paigo_customer_id"
    type = "string"
  }
}

Since versioned streams always provide the up-to-date views of the data, they are a perfect choice for lookup tables. For example, in our case, whenever a new workspace is created, a new record will be automatically inserted into the workspace_plans stream, and once a workspace has updated its subscription plan, an update record will be automatically sent to the stream as well, so `SELECT * FROM workspace_plans` will keep returning the latest information. (If you are not familiar with Timeplus query yet, this query is a streaming query, meaning it never stops until it is told so, more details can be found here). Same thing for the paigo_dimensions stream, we can easily update the stream when we update the dimension configurations on the Paigo side. And here are the examples of how data look like in these streams:

Stream: paigo_dimensions

offering_name	dimension_internal_name	dimension_id
Free Tria	cpu_usage	aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
Professional	cpu_usage	bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb

Stream: workspace_plans

workspace_id	plan_name	paigo_customer_id
abcdefgh	Free Trial	cccccccc-cccc-cccc-cccc-cccccccccccc
12345678	Enterprise	dddddddd-dddd-dddd-dddd-dddddddddddd

Now we are ready to take a look at the sink:

resource "timeplus_sink" "paigo" {
  name        = "Paigo/cpu_usage"
  description = "Send cpu usage metric to Paigo."
  type        = "http"
  query       = <<-SQL
WITH results AS
  (
    SELECT
      format_datetime(window_end, '%FT%H:%M:%SZ', 'UTC') AS timestamp, to_string((max(value) - earliest(value)) / 300) AS recordValue, replace_one(tags['namespace'], 'tp-tenant-', '') AS tenant
    FROM
      tumble(k8s_metrics, 5m)
    WHERE
      (name = 'container_cpu_usage_seconds_total') AND ((tags['container']) = 'proton')
    GROUP BY
      window_end, tags['namespace']
  ), dimensions AS
  (
    SELECT
      offering_name, dimension_id, dimension_internal_name
    FROM
      default.paigo_dimensions
    WHERE
      dimension_internal_name = 'cpu_usage'
  )
SELECT
  results.timestamp AS timestamp, results.recordValue AS recordValue, dimensions.dimension_id AS dimensionId, workspace_plans.paigo_customer_id AS customerId, dimensions.dimension_internal_name AS dimension_name
FROM
  results
INNER JOIN default.workspace_plans ON results.tenant = workspace_plans.workspace_id
INNER JOIN dimensions ON workspace_plans.plan_name = dimensions.offering_name
  SQL

  properties = jsonencode({
    content_type = "application/json"
    http_method  = "POST"
    oauth2 = {
      client_key    = var.paigo_client_id
      client_secret = var.paigo_client_secret
      enabled       = true
      scopes        = []
      token_url     = "https://auth.paigo.tech/oauth/token?audience=${var.paigo_audience}&grant_type=client_credentials"
    }
    payload_field  = "{{ . | toJSON }}"
    url            = "https://api.prod.paigo.tech/usage"
  })
}

Yes, there are many lines. No worry. Let's talk about some key settings.

The properties field tells the sink how to send data to the external system, while the query field tells the sink how to fetch data. Unlike using a traditional database, which requires one to implement something like running a query periodically to keep fetching the latest data, the query used in the sink is a streaming query, which means data will just flow into the destination continuously. We just need to write the SQL.

How this works, put in a simplified way, is that with

metrics kept being ingested into the append-only stream k8s_metrics.
The query will automatically get fed with these latest metrics in real time.
Then the query filters out the metrics it does not need, and calculates the results on a 5-minute window.
Lastly, it joins the results with the two versioned streams to enrich the results so that the sink gets all the needed data to send to Paigo.
Since the query has already had the data well-prepared, the sink just needs to send them out by encoding them into JSON format (the payload_field = "{{ . | toJSON }}" part), and done!

Let’s review the high level design again:

As you can imagine, for monitoring these sinks, we can simply just build another sink (Timeplus also supports alerts, which are like higher-level sinks and they have more specific features for alerting)! And this is what we have done too. We have sinks to compare the actual number of data the sinks sent to Paigo and the expected number, and if they don't match, the sinks will send a message to our Slack channel, so that we can get notified as soon as something wrong happens. I am not going to disclose more details on this one, which will make this blog way too lengthy. Let us know if you are interested in that topic, we can definitely write about it.

All in all, we are happy with this “dogfooding” solution. We leveraged a few "superpowers" of Timeplus: streaming ingest API, versioned stream, stream-to-stream JOIN, HTTP sink. The system has been up running for a few months. It is very reliable and flexible. Using Timeplus, it’s simple to add/remove/update billing usage data. The monitoring sinks give us confidence that we will be able to fix any issues at the earliest time, which is a huge plus for our customers.

Try Timeplus yourself via https://demo.timeplus.cloud or sign up a free account at https://timeplus.com.

WHY TIMEPLUS?

PRODUCT

DEPLOYMENT

WHY TIMEPLUS?

PRODUCT

WHY TIMEPLUS?

PRODUCT

Usage-based pricing with Timeplus and Paigo

Related Posts