top of page

Proton 3.0: Up to 7x Performance Gains in Pipeline Processing

  • Writer: Gang Tao
    Gang Tao
  • 1 day ago
  • 6 min read

Updated: 7 hours ago

ree

Timeplus is the missing piece between operational systems and telemetry systems - a proton user

Timeplus Proton is a single-binary stream processing engine for real-time analytics, streaming ETL, and AI/ML feature pipelines, written in pure C++. No JVM, no ZooKeeper, no dependencies.


In Timeplus Proton 3.0, our most significant performance release to date, we've achieved 7x improvement in changelog streaming, 4.8x faster aggregations, and 1.8x improvement in raw ingestion.


Let’s take a closer look at these numbers and how we got here. While it's easy to optimize for synthetic workloads that look impressive on paper, they don't reflect how real systems actually behave. Single-threaded microbenchmarks, artificial data distributions, and perfectly cached queries can all produce spectacular results.


These benchmarks are built on real-world data patterns from Proton user deployments. Our tests target specific streaming workloads we know matter in production, ensuring our performance improvements deliver actual value rather than just impressive numbers.


High-frequency/throughput ingestion


Data ingestion is the foundation of any streaming pipeline—if you can't get data into your system efficiently, nothing else matters. Without fast, reliable ingestion, even the most sophisticated downstream processing becomes irrelevant.

Crypto and financial market analysis represents one of Proton's most demanding use cases. For this benchmark, we ingested a synthetic stream of BTC, ETH, and NVDA quotes into an append-only stream—matching the volume and velocity characteristics of real-time market data feeds where milliseconds matter.

Here comes the test SQL:

CREATE STREAM perf_btc_quotes
(
  `ts_bucket`     datetime64,
  `instrument_id` string,
  `venue_id`      string,
  `price`         float64,
  `size`          float64
)
ORDER BY (instrument_id, ts_bucket, venue_id);

INSERT INTO perf_btc_quotes (ts_bucket, instrument_id, venue_id, price, size)
SELECT
  to_datetime64('2024-01-01 00:00:00', 3) + to_int64(number % 86400)                         AS ts_bucket,
  ['BTC-USD', 'ETH-USD', 'NVDA'][number % 3]                                                 AS instrument_id,
  ['binance', 'coinbase', 'okx', 'bybit', 'kraken'][number % 5]                              AS venue_id,
  (30000 + ((number % 1000) / 10.0)) + ((number % 5) * 5.0)                                  AS price,
  0.01 + ((number % 1000) / 10000.0)                                                         AS size
FROM numbers_mt(100000000);

Results:


  • Old (1.6.17-rc): 11.821 s, 100.01M rows, 800.08 MB (≈ 8.46M rows/s, 67.68 MB/s)

  • New (3.0.7): 6.502 s, 100.00M rows, 800.00 MB (≈ 15.38M rows/s, 123.03 MB/s)


ree

Refer the benchmark code here 



Stateful CDC aggregations


Maintaining real-time views of database changes is fundamental to modern data architectures, but keeping aggregated state synchronized with upstream changes is computationally expensive.


Changelog streams represent one of the most challenging patterns in stream processing. Unlike append-only streams where you simply accumulate new data, changelog streams contain updates and deletes that require maintaining accurate stateful aggregations as records constantly change.


Our test benchmark continuously computes account balances from a transfer stream operating at 6 million events per second across 40 million possible accounts. Each transfer creates two balance updates—a debit and a credit—making this a high-cardinality, stateful aggregation that stress-tests how efficiently Proton can maintain synchronized state at scale.


The test pipeline is composed by follow queries

1. Create the Changelog Stream (Data Store)

CREATE STREAM default.changelog_transfers (...)
SETTINGS mode = 'changelog';

Creates a special "changelog" stream to store transfer records. Changelog mode means it can handle inserts, updates, and deletes - perfect for tracking state changes.

2. Create the Random Data Generator

CREATE RANDOM STREAM rand_transfers (...)

Sets up a synthetic data generator that creates fake transfers between 40 million possible accounts. Each transfer has:

  • A sender account (from_id)

  • A receiver account (to_id)

  • A transfer amount (value, between 1 and 100,000)

3. Connect Generator to Storage via Materialized View

CREATE MATERIALIZED VIEW mv INTO default.changelog_transfers AS
SELECT ... FROM rand_transfers
SETTINGS eps = 6000000;

This materialized view acts as a pipeline, pushing data from the random generator into the changelog stream at a target rate of 6 million events per second - this is the high-pressure workload that stresses the system.

4. Calculate Real-Time Balances

SELECT account_id, sum(total) AS balance FROM (...)
GROUP BY account_id
EMIT STREAM ON UPDATE;

This is the aggregation query being benchmarked. It:

  • Takes each transfer and affects two accounts (sender loses money, receiver gains money)

  • Uses array_join to split each transfer into two balance updates

  • Aggregates all transactions per account to calculate running balances

  • Uses EMIT STREAM ON UPDATE to continuously output updated balances in real-time


Results (steady‑state throughput):

  • Old: ≈ 0.37M rows/s, 30.22 MB/s (2.60M rows in 7.093 s)

  • New: ≈ 2.87M rows/s, 236.87 MB/s (22.60M rows in 7.866 s) — ~7× higher row rate


ree

Refer to the benchmark code here 


Customized Computational transformation 


Proton is built on database technology with SQL as its primary interface. While SQL is powerful for declarative data transformations, it has inherent limitations when implementing complex, stateful business logic—the kind of processing that developers in Flink would handle with custom Java operators.


To bridge this gap, we embedded a JavaScript engine that allows developers to write user-defined functions (UDFs) for sophisticated data processing. This isn't just about extensibility—the embedded JavaScript engine delivers genuine performance, handling complex transformations without becoming a bottleneck in your pipeline.


We have a JavaScript UDF that simulates real-world telemetry pipeline processing by performing CPU-intensive enrichment on nginx access logs. It takes raw log fields (HTTP method, path, status code, response size, user agent, and malicious request indicator) and returns enriched JSON with:


Classification logic:

  • Status categorization: Groups HTTP status codes into families (2xx, 3xx, 4xx, 5xx)

  • User agent classification: Identifies traffic sources (monitoring tools like Prometheus/Grafana/Datadog, bots, browsers, curl, etc.)

  • Path normalization: Replaces long hex/numeric segments with :id to group similar routes


Anomaly detection (CPU-intensive):

  • Computes a synthetic "anomaly score" using hash functions, trigonometry (sin/cos), and square roots over 64 iterations

  • Adjusts scoring based on error rates, response sizes, and malicious request patterns

  • Designed to stress the JavaScript V8 engine with realistic computational work


This UDF represents the kind of complex parsing, classification, and scoring logic commonly found in observability pipelines—tasks that are awkward or impossible to express in pure SQL but critical for enriching raw telemetry data before routing it to downstream systems like Splunk, Elastic, or S3.


Observed throughput (manually cancelled after warm‑up):

  • Old: ≈ 126K rows/s, 14.4 MB/s (1.93M rows in 15.262 s, 1 CPU)

  • New: ≈ 204K rows/s, 23.3 MB/s (3.48M rows in 17.012 s, 1 CPU)


ree

Refer to the benchmark code here 


High-cardinality analytics (single-key aggregations)


This test targets Proton's "single-key reduce" aggregation - the engine's ability to collapse billions of rows into a single aggregated result. It processes 10 billion rows (80 GB) with a GROUP BY that reduces everything to one key, achieving sub-200ms execution time.


The test is implemented in one simple query here:

SELECT
  dim.symbol,
  t2.instrument_key AS has_trades_flag
FROM
(
  -- Tiny dimension: map numeric key → symbol
  SELECT
    1         AS instrument_key,
    'BTC-USD' AS symbol
) AS dim
LEFT JOIN
(
  -- Huge fact side: 10B synthetic rows collapsed into a single key
  -- (micro-benchmark for the single-key GROUP BY reduce path)
  SELECT
    1 AS instrument_key
  FROM numbers_mt(10000000000)
  GROUP BY instrument_key
) AS t2
USING (instrument_key);

This query has two parts joined together:

Left Side (dim) - The "Dimension Table":

SELECT 1 AS instrument_key, 'BTC-USD' AS symbol
  • Creates a tiny table with just 1 row

  • Maps the key 1 to the symbol 'BTC-USD'


Right Side (t2) - The "Fact Table":

SELECT 1 AS instrument_key
FROM numbers_mt(10000000000)  -- Generate 10 billion rows
GROUP BY instrument_key

  • Generates 10 billion rows (all with the value 1)

  • Then GROUP BY instrument_key collapses all 10 billion rows into a single row with key 1


And the Join:

LEFT JOIN ... USING (instrument_key)
  • Joins these two streams on `instrument_key` (which equals `1` in both)

  • Final result: 1 row with `symbol = 'BTC-USD'` and `has_trades_flag = 1`


This tests the streaming processing capability of  "single-key reduce" - how efficiently it can:

  1. Generate 10 billion rows in memory

  2. Aggregate all those rows using GROUP BY

  3. Reduce them down to a single result

This benchmark tests the critical ability to compute global aggregates (total counts, sums, averages) across billions of rows - essential for real-time dashboards showing metrics like "total API requests right now," "overall system error rate," or "portfolio-wide trading volume."


Results:

  • Old: 0.892 s, 10.0B rows, 80.0 GB (≈ 11.21B rows/s, 89.66 GB/s)

  • New: 0.184 s, 10.0B rows, 80.0 GB (≈ 54.35B rows/s, 434.78 GB/s)


ree

Note, here is the test setup for all these benchmarks

  • Environment (GCP)

  • Machine type: c4d-standard-16 (16 vCPUs, 62 GB RAM)

  • CPU platform: AMD Turin

  • Architecture: x86_64

  • Disk: 200 GB Hyperdisk Balanced (4,200 provisioned IOPS, 440 MB/s throughput)

  • Versions: Proton 1.6.17-rc (old) vs 3.0.7 (new)

  • Both versions were run using: docker run -it --rm --name proton timeplus/proton:<version>



What This Means for Your Streaming Pipelines


Proton 3.0 represents a fundamental leap forward in streaming SQL performance. These aren't just numbers on a chart—they translate directly into real-world capabilities that were previously impossible or impractical:

  • Real ingestion throughput nearly doubled (1.8x), letting you handle financial market feeds, IoT sensor data, and clickstream analytics with half the infrastructure.

  • Changelog processing saw a 7x improvement, making real-time materialized views and CDC pipelines dramatically more efficient. You can now maintain synchronized aggregations across millions of database changes per second without breaking a sweat.

  • JavaScript UDFs got 1.6x faster, proving that extensibility doesn't mean sacrificing performance. Complex enrichment logic that would bottleneck traditional systems now runs at production scale.

  • Analytics aggregations accelerated by 4.8x, enabling sub-second queries across tens of billions of rows—the foundation for real-time dashboards that actually reflect what's happening right now.


Real-time data processing shouldn't require a PhD in data processing systems. Proton 3.0 delivers enterprise-grade performance in a single binary you can run anywhere, try it now - https://github.com/timeplus-io/proton 



 
 
bottom of page