Unlock Unlimited Extensibility: Python Table Functions in Timeplus
- Gang Tao
- 3 days ago
- 5 min read
Timeplus is known for its powerful streaming SQL capabilities, but what happens when SQL alone isn't enough? Sometimes you need to integrate with external systems that don't have native connectors, apply custom transformation logic, or leverage the vast Python ecosystem for specialized data processing. That's exactly why we built Python Table Functions — a new feature that lets you embed Python code directly into your streaming SQL pipelines.
What Are Table Functions?
Table functions are a well-established concept in database systems. Unlike scalar functions that return a single value, or aggregate functions that reduce multiple rows into one, table functions return a set of rows that can be queried like a regular table.
Think of a table function as a virtual table generator. You call it in the FROM clause of a SQL query, and it produces rows on demand. This pattern is powerful because it bridges the gap between procedural code and declarative SQL — you get the flexibility of programming logic with the composability of SQL queries.
Traditional databases like PostgreSQL support table functions through stored procedures, but they're typically limited to the database's native language. Timeplus takes this concept further by enabling Python — the most popular tool for data engineering and data science.
Introducing Timeplus Python Table Functions
Timeplus Python Table Functions allow you to define custom data sources, sinks, and transformations using Python code, then use them seamlessly within streaming SQL queries. The feature supports three primary operations:
Read: Generate or fetch data from any source Python can access
Write: Send data to any destination Python can reach
Transform: Apply row-by-row or batch transformations to streaming data. The transformation can be stateful.
The syntax integrates naturally with Timeplus's external stream concept. Your Python code lives inside the $$ ... $$ block, and you specify which functions handle reading, writing, or transformation through the SETTINGS clause.
Execution Model and Function Signatures
The Python execution for Timeplus is designed to be straightforward, but adherence to the engine's calling conventions is essential:
Reading Data (read_function_name)
Arguments: None.
Return Type:
A Python list for batch processing.
A synchronous generator/iterator for streaming data.
Row Format: Each row produced must conform to the external stream schema. For single-column schemas, a list of scalars is also acceptable.
Streaming: The engine automatically detects streaming if the result is a generator; otherwise, it defaults to batch mode.
Unsupported: Asynchronous coroutines or async generators are not supported.
Default: If read_function_name is omitted, the engine uses the external stream's name.
Writing Data (write_function_name)
Calling Convention: This function is called in a vectorized manner.
Arguments: One Python list per output column, representing the data for the current chunk.
Row Iteration: For multi-column streams, you can iterate over rows using zip(col1, col2, ...).
Default: If write_function_name is omitted, it defaults to the read_function_name.
Primary Use Cases
1. Custom Data Source Integration
Need to pull data from a system without a native Timeplus connector? Python Table Functions let you write a simple read function that yields rows. Here's an example connecting to Kafka using the kafka-python library:
Now you can query this stream like any other:
2. Custom Data Sinks
The same stream definition supports writing. You can push query results to any system Python can communicate with:
This flexibility means you can route streaming data to proprietary APIs, legacy systems, or any destination that lacks a native connector.
3. Streaming Transformations
For data transformation, you can define a Python function that processes columns and returns computed results. Here's a simple example that sums pairs of values:
Then apply it to your streaming data using the python_table() function:
Result:
This pattern opens the door to sophisticated transformations: ML model inference, custom parsing logic, data enrichment from external APIs, and more.
Benefits
Python Table Functions unlock several key capabilities for Timeplus users:
Virtually Unlimited Connectivity. Python's ecosystem includes libraries for virtually every data system, API, and protocol. If Python can connect to it, Timeplus can now stream data to and from it.
Reuse Existing Code. Many organizations have battle-tested Python code for data processing. Python Table Functions let you reuse that code directly within streaming SQL pipelines, avoiding costly rewrites.
AI Integration. Data scientists work in Python. With Python Table Functions, you can apply trained models, feature engineering logic, or statistical functions directly in your streaming queries without moving data to a separate system.
Rapid Prototyping. When you need to integrate a new data source or apply custom logic, you can iterate quickly in Python without waiting for native connector development.
Unified Query Interface. Despite the Python code running underneath, your queries remain SQL. This means your existing dashboards, applications, and workflows continue to work — they're just powered by more flexible data sources.
Native & Performance. Since Timeplus C++ core engine embeds the CPython interpreter, the Python table function call happens in the same process. It is raw Python performance compared to others bridged IPC implementations via a different Python process.
Getting Started
To start using Python Table Functions:
Define your external stream with the appropriate column schema
Write your Python code inside the $$ ... $$ block
Implement read(), write(), or transform functions as needed
Configure the SETTINGS to point to your function names
Install dependencies using `SYSTEM INSTALL PYTHON PACKAGE`
Create the table function using `CREATE EXTERNAL STREAM …`
Use the stream in your SQL queries like any other data source
For transformations, wrap your function call with python_table() and pass the input stream and columns as arguments.
Real World Example: Streaming Bluesky Posts in Real-Time
Let's look at a practical example that demonstrates the power of Python Table Functions. Bluesky is a decentralized social network built on the AT Protocol. It exposes a public firehose called Jetstream — a WebSocket feed that streams every post, like, and follows happening across the network in real-time.
With traditional approaches, you'd need to build a separate ingestion service, manage message queues, and wire everything together. With Python Table Functions, you can connect directly to Jetstream and query the social firehose with SQL — in just a few lines of code:
User can create the connection by running following SQL:
Now user can run streaming SQL on the entire Bluesky network:
No Kafka or custom ingestion pipeline. All you need is Python, SQL, and real-time social data.
Summary
Python Table Functions represent a significant expansion of what's possible with Timeplus. By combining the expressiveness of Python with the power of streaming SQL, you can build real-time data pipelines that connect to any system, apply any transformation, and deliver results to any destination — all while maintaining the simplicity and composability of SQL queries.
We're excited to see what you build with this capability. Share your use cases and feedback with us — your input shapes how we evolve this feature.
Ready to try Python Table Functions? Get started with Timeplus and explore our documentation for more examples and best practices.