top of page

Easy Streaming SQL with Timeplus and Aiven for Apache Kafka

Timeplus 🩷s Aiven. We share the same passion to make stream processing easy to learn, easy to build, and easy to maintain, providing data engineers with the confidence and ease to forecast monthly expenses without any surprises from variable costs.


Today, I am happy to announce that Timeplus has added special features for Aiven. Our updated Timeplus Demos (https://demo.timeplus.cloud) are now using Aiven for Apache Kafka as the default messaging platform.


 

If you’re new to Aiven for Apache Kafka (https://aiven.io/kafka), it is a fully managed Kafka-as-a-service with key benefits:

  • Automatic updates and upgrades. 

  • Super-transparent pricing. No networking costs.

  • 99.99% uptime. 100% human support.

  • Scale up or scale down as you need.


What does Aiven look like in action? Let me show you how we set up the Kafka service on Aiven to light up our public demo server.



You read it right. It’s as low as $265 a month, with 3-node high availability setup. If we reach a constantly high workload, we can upgrade to a higher plan without any service disruption. I can even stop the Kafka service any time and restart it later, with no fees charged during the hours when the service is stopped. If you have tried other managed Kafka services in the cloud, you will appreciate Aiven’s transparency and predictability. Pay-as-you-go and multi-dimensional charge models from other services may bring many surprises when you receive the bill.


 

So, what feature did Timeplus build for Aiven specifically? Isn’t Aiven for Apache Kafka just a Kafka-as-a-Service that works in the same way as Apache Kafka? Yes, Aiven for Apache Kafka speaks in Kafka API, as do a few other vendors. 


Here’s what makes Aiven special:

  1. All traffic to Aiven services is always protected by TLS. It ensures that third parties can't eavesdrop or modify the data while in transit between Aiven services and the clients accessing them.

  2. Every Aiven project has its own private Certificate Authority (CA), used to sign certificates that are used internally by Aiven’s services to communicate between different cluster nodes and to Aiven management systems.


No matter if you enable SASL authentication or not, the private CA needs to be specified while setting up the connection. In other managed Kafka cloud services, the TLS is commonly signed by a public CA, so you don’t need to provide the certificate content.


In the recent release of Timeplus Proton (the open-source streaming SQL engine), we introduced a few ways to handle the private CA:


Option 1: Specify the path to the CA certificate

CREATE EXTERNAL STREAM ext_stream(raw string)
SETTINGS type='kafka', 
         brokers='name.a.aivencloud.com:28864',
         topic='topic',
         security_protocol='SASL_SSL', 
         sasl_mechanism='SCRAM-SHA-256',
         username='avnadmin', 
         password='..',
         ssl_ca_cert_file='/kafka.cert'

Option 2: Skip the certificate verification

CREATE EXTERNAL STREAM ext_stream(raw string)
SETTINGS type='kafka', 
         brokers='name.a.aivencloud.com:28864',
         topic='topic',
         security_protocol='SASL_SSL', 
         sasl_mechanism='SCRAM-SHA-256',
         username='avnadmin', 
         password='..',
         skip_ssl_cert_check=true

Option 3: Specify the content of CA certificate in the SQL

CREATE EXTERNAL STREAM ext_stream(raw string)
SETTINGS type='kafka', 
         brokers='name.a.aivencloud.com:28864',
         topic='topic',
         security_protocol='SASL_SSL', 
         sasl_mechanism='SCRAM-SHA-256',
         username='avnadmin', 
         password='..',
         ssl_ca_pem='-----BEGIN CERTIFICATE----\nMIIEQTCCAqmgAwIBAgIU..ph0szPew==\n-----END CERTIFICATE-----'

The 3rd option is particularly helpful for connecting to Aiven services from Timeplus Cloud. You don’t want to skip the TLS verification, but there is no way to upload the CA certificate to the multi-tenant Timeplus Cloud. 


When you deploy Timeplus locally with Docker or Kubernetes, via Timeplus Enterprise or Timeplus Proton, this option can also help you simplify the configuration, without mounting the file to the container environment.


Ease-of-use is always a primary focus of Timeplus. In addition to SQL syntax, these options are also available via web console, taking just a couple of clicks. 


Next, let me walk you through how to run streaming SQL on Timeplus with data in Aiven for Apache Kafka.


 

A brief tour of Timeplus Cloud and Aiven


Step 1: Start your free 14-day trial of Timeplus Cloud at https://us.timeplus.cloud and create your workspace in seconds. Click “Data Ingestion” in the left navigation menu and choose Apache Kafka.




Step 2: This will start a wizard to set up a connection between Timeplus and your choice of Kafka service, by creating an external stream. Here, I entered the broker address for my Aiven for Apache Kafka service and credentials for the read-only account. (Yes, Aiven for Apache Kafka provides an easy way to setup users and ACL.) By default, “Enable TLS” and “Validate TLS Certificate” are already enabled – you’ll just also enable “Add CA Certificate”.



Copy the content of the CA certificate from the Aiven console and paste it into the Timeplus console.





Step 3: Click Next. Timeplus will connect to the Kafka service and list all topics. In this example, I’ll choose the topic github_events.




Step 4: Timeplus will load a sample message from the selected topic and suggest the stream schema for you.




Step 5: Specify a name for the external stream and an optional description.



The underlying SQL for this external stream looks something like this:

CREATE EXTERNAL STREAM inline(actor string,
          created_at string,
          id string,
          payload string,
          repo string,
          type string
         )
SETTINGS type='kafka', 
         brokers='kafka-public-read-timeplus.a.aivencloud.com:28864',
         topic='github_events',
         security_protocol='SASL_SSL', 
         sasl_mechanism='SCRAM-SHA-256',
         username='readonly', 
         password='..',
         data_format='JSONEachRow',
         ssl_ca_pem='-----BEGIN CERTIFICATE-----\nMIIEQT..ph0szPew==\n-----END CERTIFICATE-----'

At the end of the wizard, you will be navigated to the list page of external streams.



Step 6: You can click the magnifying glass icon to explore the data.



This will generate a basic SQL, which you can run to get live results immediately:

 SELECT * FROM github_events


Unlike traditional SQL, the streaming SQL in Timeplus is long-running, unless you cancel it. With this SQL Console, you can easily apply different SQL transformations or aggregation logic and see results in streaming mode, and also easily recognize data patterns or trends with the live table header.



Step 7: Let’s implement a simple stream processing use case: you’d like to check what are the top 5 most active GitHub repositories in each hour. This can be done via the following streaming SQL:

SELECT window_start,repo,count() as event_num 
FROM tumble(github_events,1h) 
group by window_start, repo order by event_num desc limit 5 by window_start 
settings seek_to='-6h'

This streaming SQL runs a tumble window aggregation for each hour, to get the number of events for each repository in each 1-hour window, ordering results by the number of events from highest to lowest, and only getting the top 5 in each window. settings seek_to=’-6h’ will travel back to 6 hours ago.


Once you run the SQL, it scans about 50K events in the Kafka topic and shows the top 5 hot repositories in each hour.



Let’s visualize this data by turning it into a column chart in Timeplus. Since it’s a streaming SQL, every hour this chart will be automatically updated.



You can also send results back to the Kafka topics. This will be a streaming ETL pipeline:

  • An external stream to read from a Kafka topic

  • A materialized view to run the streaming SQL continuously in the background

  • The materialized view sends the SQL results to the other Kafka topic, via the other external stream


The Data Lineage page in Timeplus makes it easy for you to see the relationship between these resources:



You can also use the Aiven console to review the generated messages:



 

This is not the only way to process Kafka data in Aiven. Compared to other solutions, such as Apache Flink or ksqlDB, Timeplus provides an intuitive UI and powerful streaming SQL for you to set up the connection, explore live data, and run any streaming SQL, without the overhead of compiling code or submitting jobs.


With Timeplus and Aiven, it’s easier than ever to get started with stream processing. Try it yourself: sign up for a free trial of Aiven and Timeplus Cloud


78 views

Recent Posts

See All

Comments


Commenting has been turned off.
bottom of page