Real-Time Hacker News Monitoring with Timeplus Scheduled Tasks
- Gang Tao
- 5 minutes ago
- 3 min read

Hacker News sees over 10 million monthly visitors and serves as a leading indicator for technology trends. When a new framework goes viral, a startup gets acquired, or a security vulnerability surfaces, it often hits HN first. For developer-focused companies, monitoring HN in real-time provides valuable signals:
Developer Relations & Marketing: Track mentions of your product, spot emerging discussions, and engage with your community when it matters most
Competitive Intelligence: Monitor competitor launches, feature announcements, and community sentiment as they happen
Trend Detection: Identify rising technologies, tools, and topics before they go mainstream
Recruiting: Find active contributors discussing relevant technologies and engage them early
But here's the challenge: HN's API is pull-based, meaning you need to poll it regularly to catch new content. Building a reliable ingestion pipeline typically requires setting up cron jobs, handling retries, and managing state—all before you can even start analyzing the data.
What if you could do all of this with just SQL?
In this blog, I'll show you how to use Timeplus Scheduled Tasks to build a complete HN monitoring pipeline, from data ingestion to real-time analysis, without leaving your SQL console.
What is a Timeplus Task?
A Timeplus Scheduled Task runs a SQL query periodically and persists the results to a target stream. Think of it as a cron job, but native to your streaming database.
Tasks complement Materialized Views nicely: while MVs run continuously on streaming data, Tasks run on a schedule for batch-style operations.
Common Use Cases
Some common use cases the user can leverage tasks are:
Pull data from external APIs — Fetch data from REST APIs using Python UDFs and load it into streams
Periodic aggregations — Compute hourly/daily rollups and store them for fast querying
System monitoring — Collect cluster health metrics at regular intervals
Data synchronization — Keep external systems in sync with your Timeplus data
Demo: Real-Time Hacker News Ingestion
Let's build a data pipeline that pulls new Hacker News posts every 10 seconds and enables real-time analysis.
Step 1: Create a Python UDF to Fetch HN Posts
First, we need a function that calls the Hacker News API (https://hacker-news.firebaseio.com). This Python UDF fetches posts after a given ID with retry logic for reliability:
Step 2: Create the Target Stream
We create a stream to store the HN posts with a 7-day retention policy:
Step 3: Create the Scheduled Task
Now the magic: a task that runs every 10 seconds, checks the latest ingested post ID, and fetches new posts:
How it works:
Query the existing stream to find the highest post ID we've already ingested
Call the Python UDF to fetch up to 20 new posts after that ID
Flatten the array and insert each post as a row into hn.hn_post
Step 4: Run Real-Time Analysis
With data flowing in, you can now run analytics. Here are two examples:
Most active users in the last 24 hours:
Post type distribution in the last hour:
You can also create streaming queries (without table()) to get continuous, real-time updates as new posts arrive.
Summary
Timeplus Scheduled Tasks provide a simple yet powerful way to build data pipelines that pull from external sources on a schedule. Combined with Python UDFs, you can integrate virtually any API into your streaming analytics workflow.
In this demo, we built a complete Hacker News monitoring pipeline with just a few SQL statements:
A Python UDF to fetch data from the HN API
A stream to store the posts
A scheduled task to pull new data every 10 seconds
Analytical queries to extract insights
To learn more, check out the Timeplus Task documentation and explore other use cases like alerting, data synchronization, and system monitoring.
Try it yourself: Download Timeplus and start building real-time data pipelines today.