Analyze Diverse Data Inputs in Real-time with Crul
Updated: Feb 24
In my last blog, I showed how you can use Timeplus to easily build a real-time social listening app via Twitter’s real-time API. However, there are some scenarios where critical data isn’t immediately available via an API or a message bus. Our friends at Crul see this as an opportunity to extract information from various sources, like RSS and web pages, thereby creating an opportunity to join data sets from various sources to create a complete data set. In today’s blog, we’ll show how you can leverage Timeplus and Crul to easily integrate disparate data sources onto your streaming platform.
We are excited to announce the integration of Timeplus in Crul to solve these messy data problems for you.
Crul enables you to transform the open, SaaS, dark and API web into a dataset. Crul provides a query language (70+ commands) specifically designed for crawling, shaping and enriching web data. With the AWS AMI, built-in vault, scheduler and over 30+ sinks, Crul can forward cleansed web data into enterprise data pipelines, such as Timeplus. This enables you to access a wider range of real-time data, and powers more sophisticated use cases for your streaming data analytics:
Fintech: Derive trading insights on market movements by leveraging diverse news data streams;
Security: Receive latest security alerts (such as CVE) and inform engineering teams if related components have potential vulnerabilities. Or check TLS certification expiration time for various internal and external services and remind the administrators to update the certificates;
Online shopping: Monitor price changes on the competitor websites to dynamically adjust your prices strategy accordingly.
Hot Topic: A Sample Use Case
Summer time on the West coast often means fire season, which can be scary for any families or businesses impacted by forest fires. We thought this non-confidential use case could help illustrate the usefulness of the integration between Timeplus and Crul. Let’s imagine you are running a community for small or medium-sized businesses in California. The temperatures are rising, and there could be wildfires any moment. https://www.fire.ca.gov/ provides reliable information about active fire incidents in California.
They provide a JSON API to list such incidents: https://www.fire.ca.gov/umbraco/api/IncidentApi/List?inactive=true
However this API doesn’t provide all the information we need, especially for the important “Evacuations Orders”.
In looking at the associated CAL FIRE website we can see that evacuation orders are available online via the public facing website.
In this scenario, Crul can help you complete your data set with minimal efforts.
For this particular case, we can start from the JSON API of www.fire.ca.gov:
Using Crul, we can retrieve the data for each fire incident, extract the `Url` field, then use this value to open a headless browser to get the incident details page (see above). Ultimately, we can join the API and Web content creating an enriched dataset including fire evacuation orders. Finally we can send it to Timeplus in real-time.
Here is a query in Crul to enrich the original API to crawl, shape and transform the web data:
Out of box there is a Timeplus store in Crul, to send data to Timeplus via REST API:
Once the data arrives in Timeplus, we can easily use SQL to filter/transform/analyze data:
There is also a beta version of a Grafana plugin for Timeplus. With that, you can run streaming queries in Grafana UI and visualize the data with many built-in charts:
At Timeplus, we believe in working with technology leaders to help developers build real-time applications much faster and easier than ever before. We are excited to work with Crul to deliver on this vision. Join the Timeplus community or sign up for our private beta to learn more. We look forward to hearing your feedback!