Data Generation V4: Chasing 1M+, Finding the Limit
With Version 3 dialed up to a solid 600K+ posts/sec after some late tweaks, I was riding high. BrandPulse’s 700k/sec goal was in the bag, but I couldn’t resist pushing further—could I hit 1M+ and really show off? Version 4 was my all-in bet: auto-scaling workers, hardcore Kafka configs, and a setup built to dominate. The result? A wild ride that peaked high but crashed hard, proving v3’s 600K+ was the real champ. Here’s the tale.
The Drive
V3 was a beast at 600K+ posts/sec—way past the 700k target—but I saw a chance to double down. Theoretical maxes hinted at millions, so I aimed for 1M+ sustained throughput. I’d need dynamic scaling, tighter error handling, and a system that could flex under pressure. Problem was, I was still on one machine, and v3 was already flexing its muscles. This was about testing the ceiling—of the hardware and my grit.
The Playbook
I went big with these moves:
- Auto-Tuning Workers: Scaled workers dynamically (8 to 16+) to chase 1M/sec based on live throughput.
- Balanced Configs: Dropped
maxInFlightRequests
to 200, added minimal retries, and cut batch size to 15K for stability. - Memory Smarts: Pre-allocated batch pools and capped buffer memory at 70% of RAM.
- Resilience Boost: Built in reconnection logic and backoff for when things got messy.
- Metrics Overload: Tracked every detail—throughput, errors, progress—to pinpoint the breaking point.
The Code: V4 in Full Swing
Here’s what I unleashed—complex, bold, and a bit reckless:
const { Kafka, Partitioners } = require('kafkajs');
const { Worker, isMainThread, parentPort, threadId } = require('worker_threads');
const dataSchema = require('../../schema/avroSchema');
const os = require('os');
const config = {
kafka: {
clientId: 'dataStorm-producer',
brokers: ['localhost:9092'],
topic: 'dataStorm-topic',
retry: { retries: 2, initialRetryTime: 100, maxRetryTime: 500 },
producer: {
createPartitioner: Partitioners.LegacyPartitioner,
transactionTimeout: 10000,
compression: null,
maxInFlightRequests: 200,
idempotent: false,
allowAutoTopicCreation: false,
batchSize: 16 * 1024 * 1024,
lingerMs: 5,
acks: 0,
bufferMemory: Math.min(8 * 1024 * 1024 * 1024, Math.floor(os.totalmem() * 0.7))
}
},
batch: { size: 15000, maxBufferedBatches: 8 },
workers: { count: Math.max(8, os.cpus().length * 2), maxCount: Math.max(16, os.cpus().length * 4), restartDelay: 1000 },
metrics: { reportIntervalMs: 1000, syncIntervalMs: 20 },
target: 50000000,
stability: { connectionBackoffMs: 2000, maxErrors: 50, backoffDurationMs: 5000 }
};
if (isMainThread) {
process.env.UV_THREADPOOL_SIZE = config.workers.maxCount * 4;
const v8 = require('v8');
v8.setFlagsFromString('--max_old_space_size=8192');
const workers = new Map();
let totalRecordsSent = 0;
let startTime = Date.now();
const spawnWorker = (id) => {
const worker = new Worker(__filename);
workers.set(id, worker);
worker.on('message', (msg) => {
if (msg.type === 'metrics') totalRecordsSent += msg.count;
});
};
for (let i = 0; i < config.workers.count; i++) spawnWorker(i + 1);
} else {
const kafka = new Kafka({ ...config.kafka, clientId: `${config.kafka.clientId}-${process.pid}-${threadId}` });
const producer = kafka.producer(config.kafka.producer);
const batchPool = Array(config.batch.maxBufferedBatches).fill().map(() => Array(config.batch.size).fill({ value: dataSchema.toBuffer({ id: 0, timestamp: '', value: 0 }) }));
const runProducer = async () => {
await producer.connect();
let recordsSent = 0;
let batchIndex = 0;
while (true) {
const batch = batchPool[batchIndex];
batchIndex = (batchIndex + 1) % config.batch.maxBufferedBatches;
const randomIndex = Math.floor(Math.random() * config.batch.size);
batch[randomIndex].value = dataSchema.toBuffer({ id: threadId, timestamp: new Date().toISOString(), value: Math.random() * 100 });
await producer.send({ topic: config.kafka.topic, messages: batch });
recordsSent += config.batch.size;
parentPort.postMessage({ type: 'metrics', count: config.batch.size });
await new Promise(resolve => setImmediate(resolve));
}
};
runProducer().catch(err => console.error(`Worker error: ${err.message}`));
}
(Full code with auto-tuning and metrics in the repo—above is the core.)
What I Got
- Peak Throughput: Spiked to 857K posts/sec—thrilling for a hot second!
- Sustained Average: Averaged ~400K posts/sec over 49 seconds, totaling 42M records.
- The Collapse: System froze in seconds—memory maxed out, workers stalled, game over.
Results Breakdown
Metric | Value | Target | Progress |
---|---|---|---|
Peak Throughput | 857K/sec | 1M/sec | 85.7% |
Sustained Avg | 400K/sec | 1M/sec | 40% |
Total in 49s | 42M | 49M | 85.7% |
Where It Broke
- Memory Overload: 8GB buffer and 16+ workers pushed RAM past 90%—no headroom left.
- Worker Chaos: Scaling beyond 8 cores caused I/O thrashing and CPU bottlenecks.
- Serialization Drag: Avro’s
toBuffer()
couldn’t keep up with the pace, even pre-allocated. - Broker Limits: Single broker buckled under 200 in-flight requests with no
acks
.
Key Takeaways
- Know Your Edge: V3’s 600K+ was sustainable; v4’s 857K peak was a tease—hardware calls the shots.
- Stability Wins: Chasing 1M/sec traded reliability for a flashy number. Real systems need balance.
- Scale Smart: Auto-tuning was cool, but over-scaling burned me—sometimes less is more.
Why It’s Valuable
This wasn’t a flop—it was a lesson in boundaries. I was taught to push limits with what we’ve got, and v4 was me doing just that—stretching too far to find the breaking point. It proved v3’s tuned 600K+ posts/sec was the practical powerhouse, ready for BrandPulse’s real-world needs. Recruiters see the drive; engineers see the wisdom in pulling back.
Back to V3
V4’s crash sent me back to v3, where I’d already hit 600K+ posts/sec with solid configs. That became the final producer—fast, stable, and proven. V4 was the wild experiment that showed me where to stop.