Introducing Federated Search for Confluent Cloud and Apache Kafka

Jove Zhong
Jul 5, 2022
2 min read

Updated: Mar 22, 2024

Timeplus exists to help you extract more value from your streaming data. Today we are happy to announce a new feature External Stream in the latest Timeplus Cloud: querying data in Confluent Cloud or self-managed Apache Kafka without needing to move data into Timeplus. This is also known as a federated search.

In this scenario, data resides in a single data vendor (Confluent in this case) without being moved around to a different infrastructure. This dramatically reduces data silos and data management overhead for end users.

If your organization is actively using Confluent Cloud, Confluent Platform, Apache Kafka or Redpanda, you can use Timeplus to quickly set up a powerful SQL query on top of your existing streaming topics.

First, create a new stream in the Timeplus UI and choose to create an external stream:

Choose a name for the stream and set the Kafka broker’s URL, authentication and topic name:

Click the ‘Create’ button:

Click the search button on the right side to run the federated search:

This will generate a SQL for you to explore the stream:

You can run more complex streaming queries, such as this tumble window aggregation:

Benefits of External Stream

The benefits to leveraging external streams via a federated search are:

No data to move: With the external streams, data resides in a single data vendor.
No data to copy: There is one single source of truth for streaming events, resulting in a substantial cost reduction for storage and governance overhead
No time to wait: For Timeplus customers, you can query streaming topics in Confluent Cloud or Apache Kafka right away, without the need to wait for data ingestion. Time-to-value at speed!

Additional Considerations for External Streams

There are some additional factors you should consider for when using Timeplus analytics to query on external streams:

Authentication is either None or SASL Plain. SASL Scram 256 or 512 is not supported yet.
Data format in JSON or TEXT format. AVRO or schema registry service is not supported yet. The entire message will be put in a raw string column.
Since the raw data is not stored in Timeplus, we cannot attach event time or index time for each event at ingestion time. You can specify the event time with an expression in the query, such as tumble(ext_stream,now(),1m) or tumble(ext_stream,to_time(raw:order_time),1m)
Unlike normal streams, there is no historical store for the external streams. Hence you cannot run table(my_ext_stream) or settings query_mode='table' to access data even before you create the external stream, you can use settings seek_to='earliest' or seek_to a specific timestamp in the past.
There is no retention policy for the external streams in Timeplus. You need to configure the retention policy on Kafka/Confluent/Redpanda. If the data is no longer available in the external systems, they cannot be searched in Timeplus either.

Join our Timeplus community or sign up for our Timeplus Cloud to learn more. We look forward to helping you solve your streaming analytics challenges.

WHY TIMEPLUS?

PRODUCT

DEPLOYMENT

WHY TIMEPLUS?

PRODUCT

WHY TIMEPLUS?

PRODUCT

Introducing Federated Search for Confluent Cloud and Apache Kafka

Benefits of External Stream

Additional Considerations for External Streams

Related Posts