The Day We Broke the Memory Bank
- Gang Tao

- 2 days ago
- 5 min read
When High-Cardinality Stream Joins Consumed All My Memory
When Fast Isn't Fast Enough
The story begins like this: It's a typical trading day at a securities firm. Thousands of trades are flying through the system every second. Buy orders, sell orders, position updates, stock prices ticking up and down like a caffeinated heartbeat. Somewhere in the depths of the data center, your real-time profit calculation system is sweating bullets, trying to keep up.
You're joining three real-time data streams and one massive historical table, all to answer one simple question: "How much money are we making right now?"
Simple question. But it's not so simple to answer.

The Cast of Characters
Let's meet our data characters that we are going to use to answer that simple question:
The Execution Stream (12,000 events per second)Think of this as the play-by-play announcer of your trading floor. Every time a trade executes, it shouts: "Account A123 just bought 500 shares of Security XYZ at $105.50!"
The Position Stream (12,000 events per second)This is your inventory tracker. It knows exactly how many shares of what security each account is holding at any given moment.
The Stock Stream (3,000 events per second)The pulse of the market itself. Real-time price ticks for every security you're trading. Last price, opening price.
The Pre-Value Table (5 million rows, sitting there like a patient librarian)This is yesterday's homework. It remembers what every security position was worth when the market closed yesterday. You need it to figure out today's profit as a baseline.

The Million-Dollar Math Problem
To answer the question about how much money we are making right now, Timeplus uses a Materialized View — a continuously running process that automatically updates results whenever new data arrives. It works much like an Apache Flink job, constantly processing incoming events to provide up-to-date answers in real time.
Here's the SQL query that's supposed to make all this magic happen:
create materialized view profit_mv as
select
sum_if(px * qty, side = '1') as buy_amount,
sum_if(px * qty, side = '2') as sell_amount,
sum(fee) as deal_fee,
latest(HoldingQty*f.LastPx) as cur_value,
latest(pre_value) as pre_value,
latest(event_ts) as event_ts,
(cur_value + sell_amount - buy_amount - deal_fee - pre_value) as profit,
now64(3) as ts,
SecurityId, SecurityAccount
from
(... three streams joined together ...)
group by SecurityId, SecurityAccountLooks innocent enough, right? Just joining some streams and tables, doing some math, grouping by security and account. What could go wrong?
Hey, We Have a Memory Problem
Here's where things get interesting (and by interesting, I mean "oh no, we're going to need a bigger server").
Let's do some back-of-the-napkin math:
3,000 securities
5 million accounts in the system
Maximum possible unique groups: 15 billion combinations
In a real scenario, not all accounts are active every day. In our case, only about 1 million accounts actually trade. But even with just 1 million active accounts, the hash tables used for these joins can still grow very large and consume significant memory.
Here's what gets cached in memory:
The cruel irony? 99.93% of this cached data is never actually used during a trading day.
It’s like loading an entire library into memory when you only need to read a couple of pages.
The Memory Monster Awakens
We ran this setup for 10 minutes. Just 10 minutes. Here's what happened:
Case 1: Pure Memory Join
Started at: 439 MB
Ended at: 6.17 GB
Memory usage (internal metric): 3.24 GB
Imagine your computer starting a short run with a light backpack — and by the end, it’s carrying a pile of bricks. That’s not a run anymore; it’s a slow crawl.

Enter the Hero: Hybrid Hash Join
We had a brilliant idea.
"What if we don't keep everything in memory? What if we only keep the hot data in memory and kick the cold data to disk?"
The concept is beautifully simple: A hash table with a dual personality.

The Memory Part (The VIP Lounge)
Hot data lives here. Frequently accessed account-security combinations get the fast treatment. When a trade comes in and needs to join with position data, if it's in memory, bam – instant join. No waiting.
The Disk Part (The Storage Room)
Cold data stays out here. That account that traded once three months ago and might trade again someday? It lives on disk. When it's needed (rarely), we fetch it from disk. Sure, it's slower than memory, but who cares when it happens once in a blue moon?
The Bouncer: LRU Algorithm
When memory fills up, an LRU (Least Recently Used) algorithm decides who gets kicked from the VIP lounge to the storage room. It's like musical chairs, but with data.

Refer this video for how LRU works: Least Recently Used (LRU) Explanation
The Showdown: Memory vs. Hybrid
We ran the same 10-minute test with Hybrid Hash Join enabled.
Case 2: Hybrid Hash Join
Started at: 423 MB
Ended at: 1.63 GB
Memory usage (internal metric): 1.10 GB
Let's compare:
The hybrid approach used 3.8 times LESS memory.
Same data. Same workload. Same 10 minutes. But one approach treated memory like it was trying to store the entire Library of Congress, while the other was more like, "Let's keep a bestseller shelf and put the rest in the archives."
Why This Works
The beautiful thing about financial trading is that it follows the Pareto Principle. A tiny fraction of accounts generate the vast majority of trades. Those are your day traders, your high-frequency trading accounts, your active portfolios.
The other accounts? They're buy-and-hold investors, dormant accounts, or positions that haven't moved in weeks. They need to be available for joining, but they don't need to be in memory consuming precious RAM.
Hybrid Hash Join recognizes this reality. It says:
"If you're active, you stay in memory for quick access"
"If you're not active, I'll keep you on disk and fetch you if needed"
The LRU eviction policy ensures that hot data stays hot and cold data stays on ice.

What we Learned
Not all data is created equal. Hot data deserves VIP treatment. Cold data deserves efficient storage.
The 80/20 rule is real. In our case, it's more like the 0.07/99.93 rule. A tiny fraction of data gets all the action.
Memory is a finite resource. Treat it like prime real estate, not a landfill.
Hybrid approaches beat pure approaches. It's not about being all-memory or all-disk. It's about being smart about which is which.
Try It Yourself
Want to experiment with Hybrid Hash Join in your own streaming joins? Try run the SQL here: https://gist.github.com/gangtao/5bc9d26481268f7eb1376e05b1fd3a2e
The pattern is simple:
Identify joins with high cardinality reference tables
Determine if your access pattern is skewed (hot keys vs. cold keys)
Configure a reasonable memory limit for the join buffer
Enable hybrid hash and let the LRU algorithm do its thing
Monitor memory usage and performance

Summary
All streaming processors face the same fundamental problem:
Streaming joins require maintaining state for both sides of the join
High-cardinality data (millions of unique keys) means huge state sizes
Traditional approach: Keep everything in memory will make memory exhausted
The high-cardinality join memory problem isn't unique to Timeplus. Apache Flink also uses hybrid hash joins that spill to disk (RocksDB) when memory fills, Materialize shares indexed state across multiple queries (but still requires keeping full datasets in memory), and RisingWave stores state in cloud object storage (S3/GCS) with LRU caching for hot data. What makes Timeplus different is its purpose-built optimization for the Pareto Principle, where a tiny fraction of data generates the vast majority of activity - keeping hot data in memory and cold data on disk for dramatic memory savings.
You might be surprised how much memory you can save while maintaining performance. And your infrastructure team will definitely appreciate the lower AWS bills.
Remember: The best optimization is the one that's invisible to users but visible to your memory monitor.
Happy joining!


