Updated: Oct 31
With yesterday’s announcement of the 3-nanometer M3 chips from Apple, we at Timeplus are spending our Halloween thinking about “scary fast.” Our bailiwick is fast software though, not hardware - so we wanted to show you how you can build your own “scary fast” stream processing with Timeplus on a commodity machine.
Here's a preview showing how you can view millions of events per sec:
To start off, Timeplus is based on the open-source project Proton: a unified analytics system handling both streaming and historical analytics processing. As a “Halloween treat,” you can now install Proton with HomeBrew, plus we also describe some “Halloween tricks” for showing performance with synthetic data.
brew tap timeplus-io/timeplus brew install email@example.com
From Terminal, run the command line program:
Run SQL Queries
Let’s get started by creating a special stream to generate random data:
CREATE RANDOM STREAM livedata( EventTime datetime64(9, 'UTC') default now(), Key string default 'Key_'||to_string(rand()%4), Val float default rand()%1000/10 ) SETTINGS eps=300e6;
This will create a stream called livedata that has a typical Time-Key-Value schema. In the capital markets, this could be a symbol-price pair. In IIOT, this might be a sensor’s tag or point identifier and measure value (like temperature, pressure, etc). The EPS setting specifies the system will produce 300 million each second.
When you query the stream in Proton using you will see the first 10,000 lines and then see a progress bar as the query is continuously processed.
SELECT * FROM livedata;
An M2 Max MacBookPro handles about 89-90 million EPS (the system is also creating the livedata stream).
↑Progress: 28.89 billion rows, 982.24 GB (88.86 million rows/s., 3.02 GB/s.)
Feel free to play around with the EPS setting, you can run this command to drop the stream:
DROP STREAM livedata;
On M2 Max, the max appears to be around 300M eps.
It's noteworthy that you only have to work with streaming SQL syntax: you and others on your teams can leverage what you know about SQL. (You don't have to learn a specialized language or go out and find folks who've learned it.)
Imagine you had 2 streams that you needed to join - maybe one provides a larger context (for example: network infrastructure telemetry and business app instrumentation) or two or more streams that simply arrive via different methods.
We create 2 streams now:
CREATE RANDOM STREAM live_A( aEventTime datetime64(9, 'UTC') default now(), aKey string default 'Key_'||to_string(rand()%4), aVal float default rand()%1000/10 ) SETTINGS eps=100e6;
CREATE RANDOM STREAM live_B( bEventTime datetime64(9, 'UTC') default now(), bKey string default 'Key_'||to_string(rand()%4), bVal float default rand()%1000/10 ) SETTINGS eps=100e6;
And join the two streams on matching keys, and when the times line up:
SELECT live_A.*, live_B.* FROM live_A ASOF JOIN live_B ON live_A.aKey = live_B.bKey AND live_A.aEventTime < live_B.bEventTime;
We see about 30 million EPS:
↑Progress: 416.00 million rows, 14.14 GB (31.07 million rows/s., 1.06 GB/s.)
This is a common pattern for trading in the capital markets: you join quotes and trades data to get a better view into the price discovery process. (More on that in a later blog.)
The synthetic origin of these datastreams make it easy to test out different use case patterns. A few examples:
in capital markets, you might have streams of data tied to the prices of shares of company equity; and there are be on the order of thousands of companies.
In options or derivatives trading, you will have multiple contracts for any asset, since the contract expires at different dates; magnitude increases to tens-of-thousands.
In IIOT, such high-tech manufacturing for semiconductors, you may have hundreds-of-thousands of sensors (tag-value style).
In networking, you may have a very large number of entities you want to group by (cardinality in order of millions).
To quickly validate the performance with these different use cases, you can simply change the configuration and settings of the random stream. For 1 million possible keys, you can simply rerun the query:
CREATE RANDOM STREAM livedata( EventTime datetime64(9, 'UTC') default now(), Key string default 'Key_'||to_string(rand() % 1e6), Val float default rand()%1000/10 ) SETTINGS eps=10e6;
SELECT Key, avg(Val) AS AvgVal FROM livedata GROUP BY Key SETTINGS group_by_two_level_threshold_bytes=1e9;
This query finds the average of all values, which arrive 10 million times per second (eps=10e6), grouped by 1 million keys (key with 1e6 random, concatenated numbers).
Our goal with Proton has always been to provide a very intuitive and powerful engine for unified analytics. We are thrilled that Proton delivers such amazing EPS figures and excited to see how developers use it.
To try out Proton for yourself and see the performance on your data, brew install as above and have fun. To see a more complete demo, check out the Docker Compose image that includes Redpanda, and the Owl-Shop e-commerce example, plus many more. See our Get Started guide on GitHub here, and show your love by giving us a star. Happy Halloween, and happy streaming!