Optimizing Splunk at Scale with Timeplus Data Pipeline

Gang Tao
a few seconds ago
5 min read

Cut costs, boost performance, and catch early signals via native S2S protocol.

According to Gartner's report, 40% of all log telemetry will flow through pipeline products by 2027, up from <20% in 2024.

This is why Timeplus now has support for Splunk S2S protocol, making us a streaming-first alternative to traditional observability pipelines. We enable enterprises to intercept Splunk data for real-time processing, intelligent routing, and significant cost reduction.

In today’s blog, I will introduce you to this new feature and how it can help you to build a more efficient telemetry pipeline.

Splunk S2S Protocol: The Key to Intercepting Forwarder Data

The Splunk-to-Splunk (S2S) protocol is Splunk's proprietary TCP-based transmission protocol used by Universal Forwarders and Heavy Forwarders to send data to indexers. The protocol has evolved through several versions.

Protocol levels:

Maximum network traffic over S2S connection.
Network traffic optimization over S2S connection.
Additional network traffic optimization over S2S connection.
Metric support.
Ack support for rawless metric events.
Flag potential dup events.
Flag for cloned metric events so that cloned events are exempted from license usage.
SSL certificate requests

Check out this reference from Splunk's community forum.

The S2S protocol transmits “cooked” data—events enriched with metadata such as source, sourcetype, host, index, and timestamp—enabling intelligent routing decisions without deep packet inspection. Universal Forwarders send unparsed data in 64-kilobyte blocks, while Heavy Forwarders can transmit fully parsed events. For implementation, the protocol supports native compression and TLS/SSL encryption, performs automatic protocol negotiation (with most third-party implementations supporting v3 and v4), encodes events as length-prefixed key-value pairs (_raw, _time, index, host, source, sourcetype), and includes acknowledgment support to provide “at least once” delivery guarantees.

Cribl, Confluent (Kafka connector), and Edge Delta have all provided related support for this protocol.

Enterprise Splunk costs are now very high.

Splunk has become extremely expensive at enterprise scale, often reaching unsustainable cost levels. Its pricing models—ingest-based, workload-based, and entity-based—can quickly drive up expenses as data volumes grow. What may start as a manageable license fee escalates rapidly: tens of thousands of dollars per year at small volumes can turn into millions annually for large deployments processing hundreds of gigabytes or terabytes of data per day.

The real burden goes beyond licensing. Total cost of ownership typically increases by 30–50% once infrastructure, implementation, training, and dedicated administrators are included, and premium add-ons like Enterprise Security can double the bill. Budgeting has also become harder with newer consumption-based pricing, leading many organizations to underestimate costs and face overruns—especially during traffic spikes. Compounding the problem, studies show that over 90% of log data delivers little analytical value, yet companies still pay full indexing costs for it as log volumes continue to explode.

The observability pipeline intercepting Splunk data

To avoid such high costs, customers can deploy telemetry pipeline products , which sit between Splunk forwarders and indexers, speaking native S2S protocol. Forwarders require only a configuration change in outputs.conf—no agent replacement needed. The pipeline then applies transformations and routes data to appropriate destinations.

With this pipeline, customers can

Reduce Splunk Indexing Costs: Filter noise, sample high-volume sources, and pre-aggregate logs into compact metrics—all before data reaches your Splunk indexers. Users don't have to pay premium indexing costs for debug logs or repetitive health checks when you can process them in-flight?
Route Data to the Right Destination at the Right Cost: Not all data belongs in Splunk. Send security-critical events to your SIEM, archive compliance logs to S3 at 100x lower cost, and forward operational metrics to your monitoring stack—all from a single pipeline with intelligent, content-aware routing.
Provide Real-time Alert and Signal, Before Indexing: Why wait for data to be indexed before detecting anomalies? With streaming SQL, you can trigger alerts, detect patterns, and enrich events in milliseconds—turning your pipeline from a passive conduit into an active intelligence layer.

With the new S2S Protocol support, Timeplus can enable customers to build such pipelines easily.

Here is an example:

Check out the code here: https://github.com/timeplus-io/examples/tree/main/splunk_s2s

Walkthrough and SQL details below:

1. Create the stream to receive S2S data

First, we create a stream to store incoming Splunk forwarder data. This stream captures the standard Splunk metadata fields: index, host, source, sourcetype, the raw event, and any additional fields.

2. Create the S2S input listener

Next, we create an input that listens on port 9997 (the standard Splunk S2S port) and directs incoming data to our stream. Once this is running, Splunk forwarders can send data directly to Timeplus.

3. Query the incoming data

Now we can query the stream in real-time to see events as they arrive from Splunk forwarders.

4. Create a Splunk HEC output and Send data to it using MV

To route data back to Splunk, we create an external stream that sends events via HTTP Event Collector (HEC). This allows Timeplus to forward processed data to Splunk indexers.

The materialized view `write_to_splunk` continuously filters events from the "default" index and forwards them to Splunk via HEC. Only the data you need reaches your Splunk indexers—reducing indexing costs.

5. Create an S3 destination for archival

For cost-effective long-term storage, we create an external table pointing to S3 (or S3-compatible storage like MinIO). This destination stores data at a fraction of Splunk indexing costs.

The materialized view `mv_splunk_aduit_logs_to_s3` routes Splunk audit logs (from the "_audit" index) directly to S3 for compliance archival—keeping them out of expensive Splunk indexing while maintaining full fidelity.

6. Processing data in realtime with Timeplus SQL

Following queries shows how user can processing these data in realtime with Timeplus SQL

The first query demonstrates real-time parsing of Linux TA (Technology Add-on) data. It extracts process metrics from the top command output using regex, turning raw text into structured fields like PID, CPU%, memory usage, and command name.

The second query extracts installed package details from the Linux TA's package inventory data—parsing name, version, release, architecture, vendor, and group from raw text into queryable columns.

Timeplus brings real-time processing to observability pipelines

Timeplus is purpose-built for high-performance telemetry processing. As a single C++ binary under 500MB—with no JVM or ZooKeeper dependencies—it deploys anywhere from edge to cloud with minimal operational overhead.

Ingest from Anywhere: Timeplus connects natively to the telemetry sources you already use: Kafka, Redpanda, Pulsar, and now Splunk forwarders via native S2S protocol support. This means your existing Universal Forwarders and Heavy Forwarders can send data directly to Timeplus on port 9997—no agent changes required.
Process with Streaming SQL: Transform, filter, and analyze telemetry data in-flight using standard SQL extended for streaming. Built on the battle-tested ClickHouse engine, Timeplus adds tumbling/hopping/session windows, watermarks, and ASOF JOINs for time-series intelligence. Need custom logic? Python and JavaScript UDFs have you covered.
Deliver to Multiple Destinations: Route processed data wherever it needs to go: back to Splunk via HEC, to S3 for cost-effective archival, or to downstream systems like ClickHouse, MySQL, and PostgreSQL. One pipeline, multiple destinations, intelligent routing.

With the S2S-capable Timeplus, you can:

Receive data natively from existing Splunk Universal/Heavy Forwarders (no agent changes)
Process with streaming SQL for real-time alerting, filtering, aggregation
Route intelligently to Splunk (for high-value data), S3 (for archival), or other destinations
Reduce costs by filtering before expensive indexing while maintaining analytical capability

This can benefit quite a lot of customers:

For existing Splunk users, it provides:

Intercept forwarder data with zero agent changes (S2S protocol compatibility)
Process with familiar SQL rather than proprietary SPL or pipeline DSLs
Route high-volume, low-value data to S3 at 100x cost savings
Maintain real-time alerting and analytics on data that never reaches Splunk
Typical savings: 30-50% reduction in Splunk licensing costs

For observability pipeline evaluators, it provides:

Streaming SQL provides analytical power beyond ETL transformations
Sub-10ms latency for real-time monitoring requirements
Open-source core (Proton) reduces vendor lock-in concerns
Unified platform for streaming and historical queries
Team with Splunk DNA understands observability requirements

As log volumes continue to explode and observability costs become a board-level concern, the need for intelligent telemetry pipelines has never been greater. With native S2S protocol support, Timeplus bridges the gap between your existing Splunk infrastructure and the realtime future. We're committed to building this capability.

Try it out with our example code, star us on GitHub, and let us know what you build!