Powering Up Real-Time Feature Pipelines for AI/ML

Gang Tao
16 hours ago
14 min read

Updated: 13 hours ago

Previously, I wrote a blog on the potential of real-time machine learning feature platforms using Timeplus. Now, our customers have started using Timeplus to build their ML feature pipelines. In particular, a gaming company is now building smart systems that can make decisions instantly as players interact with their games.

Today, I'm sharing one such customer showcase. Their platform handles thousands of player actions every second and can make decisions in less than sub-second - faster than you can blink. This speed matters because in gaming, waiting even a few minutes to respond to player behavior can mean losing customers or missing fraud before it causes damage.

The Reality of Modern Gaming ML

Modern gaming platforms like Fortnite process 92 million events per minute. In this environment, traditional batch-processed ML features arrive too late to matter. When a player shows signs of frustration, exhibits potential fraud patterns, or demonstrates churn risk, the window for intervention is measured in minutes, not hours or days.

Consider these time-sensitive scenarios that our gaming customer faces daily:

Fraud Detection: Coordinated bot attacks can drain virtual economies within hours
Churn Prevention: Players often decide to quit within 24-48 hours of a negative experience
Live Operations: Tournament integrity requires real-time anti-cheat decisions
Personalization: Dynamic difficulty adjustment must respond to current performance, not yesterday's

This is where Timeplus real-time data processing capabilities shines. Unlike traditional feature stores that rely on batch ETL pipelines, Timeplus enables truly real-time feature engineering through low latency, high throughput streaming SQL, and materialized view.

Real-Time Machine Learning Systems: Critical Implementation Challenges

The deployment of production-scale real-time machine learning systems were facing complex technical challenges that often become the primary bottleneck for organizations.

The most significant challenges are:

Complexity of feature engineering and pipeline architecture

Real-time feature engineering is the most technically complex aspect of streaming ML systems, requiring organizations to maintain consistency between offline training and online serving while processing high-velocity data streams. The "train-predict inconsistency" represents one of the biggest challenges in real-time ML .

Data preprocessing at scale creates multiple bottlenecks that force architectural compromises. Late-arriving data disrupts time-windowed features, requiring sophisticated watermark management and state handling that can cause memory issues due to data skew. LinkedIn research demonstrates that feature staleness by just one hour degrades job recommendation model performance by 3.5%, illustrating the critical nature of feature freshness.

Feature freshness

Feature freshness is the time gap between when new data becomes available and when it can be used by a model for prediction. This represents one of the most complex engineering challenges in real-time ML systems.

The challenges include:

Transforming Live Data into Actionable Features
The first major blocker is collecting and transforming real-time events into ML-ready features:
- Data source complexity: Accessing diverse streaming sources from in-app activity to third-party fraud signals
- Processing unpredictability: Handling data that arrives at irregular intervals and volumes
- Pipeline fragmentation: Teams often end up with "patchworks" of stream processing systems that are difficult to monitor and prone to data loss, duplication, and skew

Reducing End-to-End Latency
Even with streaming data processing, serving delays can render real-time data useless:
- Data ingestion delays: Time required to move data from source systems into processing pipelines
- Processing bottlenecks: Complex transformations and feature calculations add latency
- State management complexity: Time-window aggregations must handle late-arriving data
- Infrastructure limitations: Legacy data warehouse-based architectures designed for offline analytics can't meet real-time requirements

The integration of streaming and batch processing systems.

Point-in-time correctness is a very challenging technically complex aspect of realtime feature platform implementation, requiring sophisticated temporal join algorithms that ensure training datasets don't contain future information.

Airbnb's Chronon framework solves these challenges through unified feature definitions that power both offline training and online inference. This approach automatically generates batch and streaming pipelines from single feature specifications, integrating Kafka, Spark Streaming, and Hive with automated workflow management. This single-source-of-truth pattern has bring lots of benefit, While to me, such a system seem quite complex to maintain and operate, when you have multiple systems running together

Why Timeplus Excels at Machine Learning Feature Platforms

1. Stream-Native Processing and Storage Convergence

Traditional ML feature platforms suffer from the "Lambda Architecture" complexity - maintaining separate batch and streaming pipelines that often produce inconsistent results. Timeplus solves this with its unified query/storage architecture that provides:

Single source of truth: Both real-time and historical features come from the same unified system
Automatic backfill: Streaming queries seamlessly access historical data when needed
No pipeline drift: Eliminates the common problem where batch and streaming logic diverge over time

Refer to https://docs.timeplus.com/architecture

Timeplus uses a dual-layer storage design optimized for ML feature platforms.

The NativeLog serves as a write-ahead log providing immediate feature availability and strict temporal ordering, enabling ML models to access features within milliseconds of ingestion rather than waiting for batch cycles.
The Historical Storage layer offers two encoding modes: columnar encoding for high-volume aggregations and time-series features requiring fast analytical scans, and row encoding for user profiles and features needing frequent updates with UPSERT operations.

This architecture eliminates the traditional trade-off between feature freshness and retention depth, delivering both real-time availability and historical feature processing without performance degradation.

2. Rich Feature Processing Capabilities

Timeplus provides rich features that can solve various feature engineering problems using streaming SQL.

Complex Temporal Feature Engineering

For real-time feature processing, as new data streams continuously, the optimal approach is grouping data into time-based windows for processing. Real-time ML requires sophisticated time-based features that traditional systems struggle with. These patterns would require complex windowing logic and state management in most systems. Timeplus makes them natural and performant through native windowing capabilities.

Key windowing patterns that Timeplus handles natively:

Tumble windows: Non-overlapping fixed intervals perfect for periodic metrics like "hourly revenue" or "daily active users"
Hopping windows: Overlapping windows ideal for sliding aggregations like "rolling 10-minute fraud scores updated every 2 minutes"
Session windows: Dynamic windows based on user activity gaps, essential for behavioral analysis like "spending patterns within gaming sessions"

Refer to https://docs.timeplus.com/streaming-windows

Cross-Stream Feature Correlation

Real-world ML requires joining features across multiple event streams. Gaming fraud detection, for example, needs to correlate spending patterns with gameplay behavior, Timeplus is built on top of database technologies, where users can use join to correlation from different streams or tables.

Refer to https://docs.timeplus.com/tutorial-sql-join

-- Cross-stream fraud detection
CREATE MATERIALIZED VIEW behavioral_fraud_signals AS
SELECT
    t.user_id,
    coalesce(sum(t.amount_usd), 0) as total_spending,
    coalesce(count(p.match_id), 0) as concurrent_matches,
    -- High spending with zero gameplay = fraud signal
    case when total_spending > 50 AND concurrent_matches = 0 
         then 1 else 0 end as spending_without_play_flag
FROM hop(transactions, 1h, 1d) t
LEFT JOIN hop(player_actions, 1h, 1d) p
    ON t.user_id = p.user_id 
    AND t.window_start = p.window_start
GROUP BY t.user_id, t.window_start, t.window_end;

Timeplus handles these joins efficiently in streaming mode, whereas traditional systems often require expensive batch jobs or complicated stream processing frameworks.

Historical Context Integration

ML models need both real-time and historical context. Timeplus supports this through seamless historical backfill integration that eliminates the traditional cold start problem and Lambda architecture complexity.

When deploying a new real-time ML feature platform, teams face the "cold start" dilemma: models need historical context to make accurate predictions, but real-time systems typically only have access to recent streaming data. Traditional solutions force organizations into complex architectures that maintain separate batch and streaming pipelines.

-- Backfill historical data from S3
INSERT INTO player_actions
SELECT * FROM external_s3_historical_data;

-- Lifetime features automatically incorporate historical context
CREATE MATERIALIZED VIEW lifetime_stats AS
SELECT
    user_id,
    count_distinct(match_id) as total_games_ever,
    earliest(timestamp) as first_game_date,
    latest(timestamp) as last_game_date
FROM player_actions
GROUP BY user_id;

This unified approach eliminates the complexity of maintaining separate hot and cold paths common in Lambda architectures.

3. High performance vectorized streaming processing, ensuring feature freshness

Timeplus’s core streaming processing engine proton is built with modern vectorization using Single Instruction, Multiple Data (SIMD) operations, which allows it to process multiple data points simultaneously rather than one at a time. This vectorized approach, combined with just-in-time (JIT) compilation, enables exceptional performance metrics:

90 million events per second (EPS) throughput capability
4 millisecond end-to-end latency
High cardinality aggregation with 1 million unique keys
Sub-millisecond query response for billions of rows

With this high performance capability, the feature freshness will no longer be a problem.

A Real Production Case Study: CyberRealms Online

Let me walk you through a production implementation for "CyberRealms Online", a battle royale game with 10M+ daily active users. Their architecture demonstrates why stream-native feature engineering is transformative:

In this use case, the basic feature pipeline looks like this:

The different game applications and servers send data to kafka in realtime
Timeplus external stream read the realtime data from kafka topics, there are multiple topics representing different data streams
For different features, a Materialized View is created which will be a long running processor that keeps processing incoming data, and transforming these data into features as soon as the data comes into the system.
A unified view is created leveraging ASOF join on all different features, providing point-in-time correctness. The ML models are trained on this view using historical data and real-time inferences are running based on the realtime feature on this same view that can effectively avoid "train-predict inconsistency" issues.
In order to keep the history of the source data, a materialized view is created writing the data into external storage like AWS S3; users can back fill these data from S3 to Timeplus stream later.

Get Data In

In this case, there are four different data streaming on four different kafka topics, including:

Player action
User transactions
User social interactions
Performance metrics

All the data is in JSON format. Here is a sample data from the player actions:

{
  "user_id": "usr_123456789",
  "session_id": "sess_abc123",
  "timestamp": "2025-09-05T14:30:15.123Z",
  "event_type": "match_end",
  "game_mode": "battle_royale",
  "match_id": "match_789",
  "event_data": {
    "placement": 3,
    "kills": 7,
    "damage_dealt": 1450,
    "survival_time": 1200,
    "result": "win",
    "items_used": ["med_kit", "shield_potion"],
    "location_final": {"x": 245.7, "y": 389.2}
  },
  "device_info": {
    "platform": "mobile_ios",
    "device_model": "iPhone_14_Pro",
    "os_version": "iOS_16.5",
    "app_version": "2.4.1"
  }
}

To get data into Timeplus, create an external stream. The user can use Timeplus’s SQL to directly query these streams without moving data:

-- Create external streams for Kafka topics
CREATE EXTERNAL STREAM player_actions (
    user_id string,
    session_id string,
    timestamp datetime64(3, 'UTC'),
    event_type string,
    game_mode string,
    match_id string,
    event_data string,
    device_info string
) SETTINGS 
    type='kafka',
    brokers='kafka-cluster:9092',
    topic='player_actions',
    data_format='JSONEachRow';

Feature Processing Pipeline

After getting data into Timeplus using the external stream*, we can build the feature processing using Timeplus query with Materialized View.

* Technically, the data is still in the Kafka topic, but logically, as all these data can be queried by Timeplus, we can think of the data as in Timeplus.

In our case, there are different types of features we are going to build:

Time window related features

Time window features represent one of the most popular feature types in real-time ML, capturing aggregated metrics within specific temporal boundaries. These features encode critical behavioral patterns—such as "transaction count in the last hour," "average session duration over 30 minutes," or "click-through rate during business hours."

-- 5-minutes player activity features
CREATE MATERIALIZED VIEW game.player_features_5m AS
SELECT 
    user_id,
    window_start,
    window_end,
    count(*) as events_5m,
    count() FILTER(WHERE event_type = 'match_start') as matches_started_5m,
    count() FILTER(WHERE event_type = 'match_end') as matches_completed_5m,
    avg(json_extract_float(event_data, 'kills')) as avg_kills_5m,
    max(json_extract_float(event_data, 'damage_dealt')) as max_damage_5m,
    count_distinct(match_id) as unique_matches_5m
FROM tumble(game.player_actions, 5m)
WHERE _tp_time > earliest_ts()
GROUP BY user_id, window_start, window_end;

With the above materialized view, we got feature of each specific user of

Total game started in 5 minutes
Total game ended in 5 minutes
Total kill number in 5 minutes
Total damage made in 5 minutes
Total game played in 5 minutes

By joining the user transaction stream and user social interaction stream, we can build some even more complex features like this one.

-- 1-day monetization and engagement features
CREATE MATERIALIZED VIEW game.mv_engagement_features_1d AS
SELECT 
    t.user_id,
    t.window_start,
    t.window_end,
    -- Transaction features
    coalesce(sum(t.amount_usd), 0) as total_revenue_1d,
    coalesce(count(t.transaction_id), 0) as transaction_count_1d,
    -- Social features from join
    coalesce(count(s.interaction_type), 0) as social_interactions_1d,
    coalesce(count_distinct(s.target_user_id), 0) as unique_friends_1d
FROM hop(game.transactions, 1h, 1d) t
LEFT JOIN hop(game.social_events, 1h, 1d) s 
    ON t.user_id = s.user_id 
    AND t.window_start = s.window_start
    AND date_diff_within(2m, t.window_start, s.window_start)
GROUP BY t.user_id, t.window_start, t.window_end
settings seek_to = 'earliest'

This feature provides information about what happened in the past one day (24h) for specific user, and slide every 1 hour:

Total spending
Total transaction count
Total different interactions user has made
Different target interactions user has made

Timeplus is designed to process hot data in a high performance way. One of the challenges in this query is when the time window grows, the state needed to be maintained will grow as well, which will consume large amounts of resources like memory. One of the techniques to improve this is limit the time range that need to maintain for stream join stream, in this case, which is the function date_diff_within(2m, t.window_start, s.window_start), in this join as we only want to join the same time window, the date_diff_within function will help to discard those past windows that won't be used for joining, as a result, will same lots of memory consumption for state management.

These realtime or near realtime features are processed as soon as new data arrives and can be used to build realtime recommendations as these are good user behavior features that represent what the user has done recently.

Cumulative Feature

Cumulative features represent cumulative metrics that aggregate data from the very first occurrence of an entity till current time. Unlike windowed features that operate on recent time periods, lifetime features maintain running totals across an entity's entire history—such as "total customer purchases since account creation" or "cumulative user login count since registration."

Computing cumulative features in streaming systems typically requires expensive full-table scans or complex state management to maintain running aggregates as new events arrive. This becomes computationally prohibitive as data volumes and entity lifespans grow.

Timeplus addresses this through global aggregation capabilities that continuously process and maintain lifetime metrics from a designated starting point. The system automatically handles:

Incremental state updates: New events trigger efficient updates to existing lifetime aggregates without full recomputation
Historical continuity: Aggregations begin from specified timestamps and maintain accuracy across system restarts
Scalable state management: Efficiently stores and retrieves lifetime metrics even for high-cardinality entities

Following feature is one of the samples of global aggregation that calculates:

Total game a user has played
The first game a user played
The last game user played

-- total first, and last game played 
CREATE MATERIALIZED VIEW game.mv_user_game_stats_feature AS
SELECT
  user_id, 
  count_distinct(match_id) AS total_game_played,
  earliest(match_id) AS first_game_played,
  latest(match_id) AS last_game_played
FROM
  game.player_actions
WHERE
  event_type = 'match_start' and _tp_time > earliest_ts()
GROUP BY
  user_id
settings seek_to = 'earliest';

Following feature is one of the samples of global aggregation that calculates:

Total game a user has played
The first game a user played
The last game user played

A more powerful capability lies in Timeplus’s streaming SQL joins, which enable sophisticated feature enrichment by correlating data across multiple real-time streams. This approach creates contextually rich features that provide ML models with deeper behavioral insights than single-stream aggregations.

CREATE MATERIALIZED VIEW game.mv_user_latency_by_event_feature AS
SELECT
  pa.user_id, 
  avg_if(pm.device_stats:network_latency_ms::float, pa.event_type = 'player_elimination') AS avg_latency_during_elim, 
  avg_if(pm.device_stats:network_latency_ms::float, pa.event_type = 'item_pickup') AS avg_latency_during_pickup,
  avg(pm.device_stats:network_latency_ms::float) AS avg_all
FROM
  game.player_actions AS pa
INNER JOIN game.performance_metrics AS pm ON (pa.user_id = pm.user_id) AND (pa.session_id = pm.session_id)
AND date_diff_within(2m)
GROUP BY
  pa.user_id
SETTINGS
  seek_to = 'earliest';

In this feature, The INNER JOIN connects player_actions with performance_metrics streams, enabling analysis of how network performance correlates with specific game events.

With conditional aggregations: The avg_if() functions create event-specific features:

avg_latency_during_elim: Network latency specifically during player elimination events
avg_latency_during_pickup: Network latency during item pickup actions
avg_all: Baseline average latency across all events

Similarly, we leverage temporal join constraints: date_diff_within(2m) to improve the join performance.

Event Based Features

Event-based features capture patterns across sequences of individual events or records, enabling ML models to understand temporal behaviors and trends that simple aggregations can't. Unlike time-window features that aggregate within fixed periods, record-based features focus on the most recent N occurrences of specific events, preserving order and enabling sequence analysis.

In the following feature materialized view, it calculates the total survival time of a user in the last 7 games. This query actually leverages global aggregation, but uses the group_array_last function to aggregate last n events into an array and then run the calculations.

-- total survival time in last 7 games
CREATE MATERIALIZED VIEW game.mv_total_survival_last_7_game AS
SELECT 
    user_id, 
    array_sum(x->x, group_array_last (event_data:survival_time::int, 7)) as total_survival_time_in_last_7_games
FROM game.player_actions
WHERE event_type = 'match_end'
GROUP BY user_id;

The main computation logic includes:

Sequential Aggregation: group_array_last(event_data:survival_time::int, 7) maintains a sliding array of the most recent 7 survival times per user, automatically discarding older records as new games complete.
Array Processing: array_sum(x->x, ...) applies a lambda function to sum all values in the survival time array, creating a rolling total across recent game performance.

This metric helps ML models assess player skill progression and engagement—players with consistently high survival times may be improving or highly engaged.

Similarly, the following feature materialized view creates a binary flag indicating streak detection, whether a player lost all 5 last games, ideal for ML models predicting player churn risk or intervention triggers.

-- 5 lost in a row last 5 games
CREATE MATERIALIZED VIEW game.mv_5_lost_in_a_row AS
SELECT 
    user_id, 
    group_array_last(event_data:result, 5) as last_five_game_result,
    case 
        when length(last_five_game_result) = 5 
             and array_count(x -> x = 'lost', last_five_game_result) = 5 
        then true 
        else false 
    end as all_five_lost
FROM game.player_actions
GROUP BY user_id

Timeplus supports another approach for record-based feature calculation. The lag/lags functions enable access to previous events relative to the current record, offering a different computational model for sequence analysis. The following query leverage lags and partition by that return the last 5 game result.

SELECT
  user_id, event_data:result, lags(event_data:result, 1, 5) OVER (PARTITION BY user_id)
FROM
  game.player_actions
WHERE
  event_type = 'match_end'

Window Partitioning: OVER PARTITION BY user_id ensures that the lag function operates independently for each user, preventing cross-user data contamination in the sequence analysis.

Multi-Record Lag: lags(event_data:result, 1, 5) retrieves the previous 5 game results relative to the current event.

Historical Data Backfill

Historical data backfill during cold startup is an important bootstrapping operation for real-time ML systems. When launching a new streaming infrastructure or restarting after maintenance, the system needs immediate access to historical context to avoid the "cold start" problem where features and models lack sufficient data for accurate predictions.

In our case, player action stream data is synchronized into an AWS S3 bucket by using a materialized view, this MV keeps moving stream data into S3. During system restart, users can easily backfill all historical data from S3 bucket.

-- external s3 stream
CREATE EXTERNAL TABLE game.ex_s3_player_actions
 (
  'user_id' string,
  'session_id' string,
  'timestamp' string,
  'event_type' enum8('match_start' = 1, 'item_pickup' = 2, 'player_elimination' = 3, 'match_end' = 4),
  'game_mode' enum8('battle_royale' = 1, 'team_deathmatch' = 2, 'capture_the_flag' = 3),
  'match_id' string,
  'event_data' string,
  'device_info' string,
  '_tp_time' datetime64(3, 'UTC')
) SETTINGS 
    type = 's3',
    endpoint = 'https://storage.googleapis.com/timeplus-demo',
    data_format = 'JSONEachRow', write_to = 'game/actions.json',
    s3_min_upload_file_size = 1024,
    s3_max_upload_idle_seconds = 60

-- backup existing data
CREATE MATERIALIZED VIEW game.mv_backup_player_actions 
INTO STREAM game.ex_s3_player_actions
AS
SELECT * FROM game.player_actions;

-- backfill historical data into stream
INSERT INTO game.player_actions
SELECT * FROM game.ex_s3_player_actions

Key Wins

By migrating to Timeplus's unified streaming platform, the customer has achieved transformational results across their ML operations:

Unified Architecture: Replaced fragmented batch ETL systems with a single streaming platform, eliminating the complexity of maintaining separate training and serving pipelines that constantly diverged and caused model performance issues.
Real-Time Decision Making: Transformed from reactive daily analytics to proactive real-time ML, enabling fraud detection, churn prevention, and personalization that responds to player behavior as it happens rather than hours later.
Developer Velocity: Empowered data scientists to build sophisticated streaming features using SQL instead of complex distributed systems engineering, dramatically accelerating feature development from months to days.

CyberRealms has now moved from managing infrastructure complexity to focusing on what matters—building ML capabilities that directly impact player experience and business outcomes.

Looking Forward: The Realtime Future of Machine Learning

The CyberRealms case study I shared today represents a broader shift toward realtime-first ML architectures. As real-time decision making becomes critical across industries, the limitations of traditional feature engineering become increasingly apparent.

“Real-time machine learning is largely an infrastructure problem. - Chip Huyen”

I can't agree more with Chip’s statement here that realtime ML is really an infrastructure problem. Timeplus enables organizations to build ML systems that match the speed of their business needs by its innovative design, which simplifies the data infrastructure usage and operation. Whether you're detecting fraud in milliseconds, preventing churn in real-time, or personalizing experiences dynamically, realtime feature engineering powered by streaming processing is becoming your most powerful tool.

See it in action: check out a video demo of this customer showcase.

Ready to build your own real-time ML feature platform? Try Timeplus and see how streaming SQL can transform your machine learning workflows.

References

WHY TIMEPLUS?

PRODUCT

WHY TIMEPLUS?

PRODUCT

Powering Up Real-Time Feature Pipelines for AI/ML

The Reality of Modern Gaming ML

Real-Time Machine Learning Systems: Critical Implementation Challenges

Complexity of feature engineering and pipeline architecture

Feature freshness

The integration of streaming and batch processing systems.

Why Timeplus Excels at Machine Learning Feature Platforms

1. Stream-Native Processing and Storage Convergence

2. Rich Feature Processing Capabilities

3. High performance vectorized streaming processing, ensuring feature freshness

A Real Production Case Study: CyberRealms Online

Get Data In

Feature Processing Pipeline

Historical Data Backfill

Key Wins

Looking Forward: The Realtime Future of Machine Learning

Related Posts

WHY TIMEPLUS?

PRODUCT

DEPLOYMENT

WHY TIMEPLUS?

PRODUCT

WHY TIMEPLUS?

PRODUCT

The Reality of Modern Gaming ML

Real-Time Machine Learning Systems: Critical Implementation Challenges

Complexity of feature engineering and pipeline architecture

Feature freshness

The integration of streaming and batch processing systems.

Why Timeplus Excels at Machine Learning Feature Platforms

1. Stream-Native Processing and Storage Convergence

2. Rich Feature Processing Capabilities

3. High performance vectorized streaming processing, ensuring feature freshness

A Real Production Case Study: CyberRealms Online

Get Data In

Feature Processing Pipeline

Historical Data Backfill

Key Wins

Looking Forward: The Realtime Future of Machine Learning