Streaming Processing Showdown
Stream processors share a common theme: timely insights. But it's not enough to just be aware of and reactive to datastreams – they should also be easy to set up, use and maintain.
ksqlDB and Proton are two engines for processing Kafka data in real-time, differing in design, efficiency, and open source license.
So, which one wins?
Two Open-Source Streaming Engines
ksqlDB (previously KSQL) is a database for Apache Kafka, a popular distributed streaming platform, and is licensed under the Confluent Community License.
Proton is a fast and lightweight alternative to Apache Flink, powered by ClickHouse. It's also the core engine of Timeplus, a cloud native streaming analytics platform.
ksqlDB and Proton share many similar features, including support for:
Stateful streaming processing
Dual query mode: unbounded and bounded*
Read data and write results back to Kafka
* Long running unbounded push-based query, and bounded pull-based query
How does ksqlDB compare with Proton?
ksqlDB offers a SQL interface, integration with Kafka, stateful processing, scalability, and great security features. However, it has its limitations, including deep coupling with Kafka, heavy resource consumption, and not specifically designed for analytics.
Along with shared features, Proton offers additional benefits compared to ksqlDB.
Let's see 5 reasons why developers are choosing Proton as an alternative to ksqlDB.
Is it ready for prime-time?
Not true open source
ksqlDB is licensed under the Confluent Community License (CLL), and there are many limitations with that license. For instance, it cannot be used for commercial purposes, and all source edits must contribute back to it.
Is it flexible?
Deep coupling with Kafka
ksqlDB is tightly coupled with Kafka, at the deployment level. Each ksqlDB server is binded with a Kafka cluster, ksqlDB uses Kafka as storage to keep lots of internal state. There is no way to process streams from different clusters unless you route the data from different clusters into the same Kafka.
Additionally, while running ksqlDB, it will impact the Kafka cluster by creating more internal topics with extra read and write.
High flexibility consuming Kafka data
Proton supports Kafka external stream with read and write, unlike other streaming processing systems where Kafka is only offered as a source or sink. Proton takes Kafka as a stream, though no direct data persisted. The user can still create a materialized view in case data is required to be persisted, but more flexibility is provided to the user.
When working with Kafka, there is no direct coupling between Proton and Kafka, so users can query any data from any Kafka cluster.
Is it efficient?
Heavy Resource Consumption
Every SQL query run on ksqlDB is a Kafka Streams application, which creates its own worker threads, adding overhead to every query.
ksqlDB uses Kafka topics to store state changelogs and using RocksDB to materialize these changelogs into tables, which means more resource consumption for the state.
Lightweight and efficient
Proton is lightweight, written in C++ and built on top of ClickHouse, notable for its outstanding performance. Leveraging SIMD, specially designed internal data format and other optimization techniques, Proton can process over 1 million records per second on a commodity computer.
Is it for analytic workloads?
Not designed for analytics
With Kafka, ksqlDB can support streaming processing, but the RocksDB key-value storage used as the table storage is not great for quickly scanning huge amounts of data while skipping irrelevant data.
Purpose-designed for analytics
Proton supports both unbounded streaming queries (append-only log) and bounded historical queries (column store based on Clickhouse). When joining streaming data with historical data, Proton can quickly scan huge amounts of historical data.
Does it support User Defined Functions?
Confluent Community License (CCL)
Java (on Kafka Stream)
Stateful streaming processing
Bounded, pull based query
Unbounded, continuously push base query
Materialized view and table concept
Yes (on top of RocksDB)
Yes (on top of ClickHouse)
Yes (source and sink)
Yes (external stream)
Support for other messaging systems
Kafka only (dependency)
Pulsar, Kinesis, and more (supports but not dependent)
Cluster and HA
No (supported by Timeplus Cloud or Timeplus Platform)
User Defined Function
Yes, role-based access control
Yes, based on ClickHouse