Migration in Minutes: From Splunk Add-Ons to Timeplus SQL

Gang Tao
14 hours ago
6 min read

Modernizing 1,000+ Pipelines with Timeplus AI Migration Agent

Splunk Technology Add-on (TA) is a type of Splunk app designed specifically for data input and normalization — essentially, it teaches Splunk how to understand and parse data from a specific technology or vendor.

A TA typically contains four key configuration files:

props.conf — Defines how to break raw data into events, handle timestamps, and specifies which field extractions to apply for each sourcetype.
transforms.conf — Contains the actual regex patterns (using PCRE syntax) that extract fields from raw log data, using named capture groups like (?P<fieldname>pattern).
eventtypes.conf — Classifies events into categories based on search strings, enabling grouping of related events (e.g., "authentication failure", "network connection").
tags.conf — Applies CIM (Common Information Model) tags to events, enabling cross-vendor correlation in apps like Enterprise Security.

Splunk Technology Add-ons represent domain knowledge encoded in regex patterns and configuration files — knowledge that becomes trapped when organizations need to modernize. That is why these configurations were called knowledge object from Splunk.

Splunkbase is Splunk’s official app marketplace where users and vendors publish and download apps and Technology Add-ons that extend Splunk’s capabilities. By providing trusted, supported, and widely adopted integrations, Splunkbase forms the foundation of the Splunk ecosystem and enables faster, more scalable, and more maintainable Splunk deployments.

Splunkbase hosts over ~2,400 apps and add-ons
Around 8 million total downloads of Splunkbase items.
Daily active users interacting with Splunkbase apps/Add-ons: ~40,000.

This is really a very active community about enterprise data knowledge. While how can we share such knowledge with other data platforms users? In today’s blog, I am going to share how we can leverage AI agents to translate this complex knowledge into streaming SQL, preserving the parsing logic while enabling real-time analytics at a fraction of the cost.

The challenge is substantial: a typical enterprise runs dozens of TAs containing hundreds of field extractions across props.conf, transforms.conf, eventtypes.conf, and tags.conf files. Manual migration means rewriting regex patterns, recreating CIM normalization logic, and validating against production data—a process that can consume months of engineering time. This is precisely the kind of tedious, pattern-heavy translation work where AI agents excel.

How AI agents tackle configuration translation

Large language models excel at exactly this kind of semantic translation. Unlike rule-based transpilers that can only handle syntactic transformations, LLMs understand the intent behind a regex pattern and can restructure it for a different engine while preserving behavior.

Research from multiple code migration projects reveals that multi-agent architectures significantly outperform single-agent approaches for complex translation tasks. A single agent attempting to parse configuration files, translate regex patterns, and validate output simultaneously tends to lose context and produce inconsistent results. Distributing these responsibilities across specialized agents—each focused on one aspect of the translation—produces more reliable output.

An effective architecture for TA migration uses following coordinated agents:

The Parser Agent reads Splunk configuration files and builds a structured representation of the extraction pipeline. It identifies dependencies between stanzas, resolves references between props.conf and transforms.conf, and creates a directed graph showing how fields flow from raw data through intermediate extractions to final CIM-normalized output. This agent stores its analysis in a format that other agents can query—essentially creating a searchable index of the TA's logic.

The Translator Agent handles the actual regex conversion. It uses chain-of-thought prompting to decompose complex PCRE patterns into components, translates each component to the target syntax, and reconstructs them appropriately. For patterns using unsupported features like lookahead, it proposes alternative implementations—perhaps using multiple extraction passes or SQL CASE expressions. Critically, this agent generates multiple translation candidates for complex patterns, allowing downstream validation to select the best option.

The Validator Agent tests translated patterns against sample data. It runs both the original Splunk extraction and the proposed SQL extraction against representative log samples, comparing extracted field values. Any discrepancies trigger a feedback loop to the Translator Agent with specific failing examples. This data-driven validation catches semantic errors that syntactic checks would miss.

The Optimizer Agent refines validated translations for performance. It combines related extractions that can share parsing work, suggests appropriate view versus materialized view placement, and generates indexes for fields used in common query patterns.

Mapping Splunk concepts to streaming SQL

The conceptual mapping between Splunk's configuration-driven approach and SQL-based streaming platforms like Timeplus is more direct than it first appears.

Search-time field extraction (EXTRACT- and REPORT- in props.conf) maps to SQL views. A view is a logical definition that executes at query time—exactly like Splunk's search-time extraction. Creating a view with regex functions produces the same effect as a REPORT- stanza:

Index-time field extraction (TRANSFORMS- in props.conf) maps to materialized views. These run continuously in the background, persisting extracted fields to storage. The tradeoff mirrors Splunk exactly: materialized views increase storage and processing overhead but dramatically accelerate queries on the extracted fields.

For complex log formats, grok functions provide familiar parsing patterns. Timeplus supports standard grok patterns (%{IP}, %{TIMESTAMP_ISO8601}, %{COMBINEDAPACHELOG}) that practitioners may recognize from Logstash:

CIM normalization translates to SQL CASE expressions, lookup joins, and computed columns. The standardization logic that lives across FIELDALIAS, EVAL, and lookup configurations consolidates into explicit SQL transformations—arguably more readable than scattered configuration directives.

Listed here are some of the concept mapping between Splunk knowledge object and Timeplus SQL:

Concept	Splunk	Timeplus
Field Extraction	Regex in transforms.conf	SQL regex/gork functions
Calculated Fields	EVAL expressions in props.conf	SQL expressions
Lookups	CSV files + LOOKUP directive in props.conf	JOIN with table or versioned stream
Event Types	Search-based classification eventtypes.conf	WHERE clauses or CASE statements
Tags	defined in tags.conf	Additional columns in views

Architectural workflow for AI-assisted migration

A practical migration workflow proceeds in four phases, each leveraging AI agents while maintaining human oversight for critical decisions.

Phase 1: Discovery and analysis

Begins with the Parser Agent inventorying all TAs in the deployment. It produces a structured manifest showing sourcetypes, extraction counts, CIM field coverage, and dependency relationships. This manifest helps prioritize migration order—typically starting with high-volume, simpler TAs before tackling complex security-focused add-ons.

Phase 2: Translation

Runs the Translator Agent against each TA's configuration files. For straightforward patterns (simple named captures without lookahead/lookbehind), translation is typically 95%+ accurate on first pass. Complex patterns may require multiple iterations with the Validator Agent. The system flags low-confidence translations for human review rather than silently producing incorrect output.

Phase 3: Validation

Uses the Validator Agent with production log samples. This phase often reveals edge cases that weren't apparent from the regex alone—unusual log formats, multi-line events, or character encoding issues. The feedback loop between Validator and Translator continues until extraction accuracy meets defined thresholds (typically 99%+ field-level accuracy for production deployment).

Phase 4: Optimization

Applies the Optimizer Agent to tune for the streaming platform's characteristics. This includes decisions about view placement (search-time vs. index-time equivalent), window function configuration for time-based aggregations, and sink configuration for routing processed data to appropriate destinations.

Throughout this workflow, AI agents handle the mechanical translation work while humans make architectural decisions and review flagged edge cases. This division of labor makes migration tractable for organizations with hundreds of TAs—a project that might otherwise require months of dedicated engineering time.

Here is a real example of migrated cisco asa data processing pipeline:

The migration decision landscape

Organizations evaluating Splunk alternatives face a crowded market. Elastic offers an Express Migration program with credits and professional services. Datadog and New Relic provide cloud-native observability with simpler pricing. Cribl has carved out a niche as middleware for gradual migrations, routing data to multiple destinations simultaneously.

Streaming SQL platforms like Timeplus represent a different architectural choice: rather than searching indexed historical data, they process data continuously as it arrives. This enables sub-second alerting latency versus the minutes-to-hours delay of scheduled Splunk searches, at performance benchmarks claiming 90 million events per second throughput and 4ms end-to-end latency.

The migration economics are compelling. Organizations commonly report 50-80% cost reduction moving away from Splunk's per-GB pricing, though this must be weighed against operational overhead for self-hosted solutions and the substantial investment in existing Splunk expertise. The Cisco acquisition has accelerated these calculations for many organizations, raising concerns about vendor lock-in and future pricing changes.

Making the transition manageable

The practical path forward combines AI-assisted automation with careful validation. Start with a single, well-understood TA—perhaps one you've debugged extensively and know intimately. Run the multi-agent translation pipeline, validate against real data, and compare query results between Splunk and the streaming platform.

Success with one TA builds both confidence and tooling refinements. The Parser Agent becomes better at handling your organization's configuration patterns. The Translator Agent's prompts get tuned for your specific regex styles. The Validator Agent accumulates sample data that represents your actual log formats.

By the TA, what started as a careful manual process becomes largely automated, with human review focused only on genuinely novel patterns. The institutional knowledge encoded in years of Splunk configuration doesn't have to remain trapped—AI agents can extract it, translate it, and carry it forward into whatever architecture comes next.

Ready to try Timeplus? Download a 30-day free trial, risk-free. See installation options here: timeplus.com/download.