Debugging race conditions in async error handlers
Concurrent promise rejections in parallel async workflows frequently trigger race conditions in centralized error handlers. This guide isolates the exact event-loop timing flaws that cause state overwrites, duplicate crash signals, and lost stack traces. Implement atomic error aggregation and isolated boundaries to prevent cascading failures, aligning with established Core JavaScript Error Handling & Boundaries practices.
- Identify concurrent rejection patterns via structured telemetry correlation
- Replace shared mutable error state with atomic aggregation queues
- Configure process-level listeners to prevent duplicate crash triggers
- Validate resolution using deterministic concurrency testing
Symptom Identification & Log Analysis
Isolate race condition indicators in production telemetry before modifying code. Correlate duplicate unhandledRejection events with identical millisecond timestamps. Trace missing stack frames caused by handler overwrites during high-throughput execution.
Map error propagation to Node.js uncaughtException vs unhandledRejection lifecycle boundaries. Look for microtask queue error collision signatures in your APM logs.
// Production telemetry snippet showing race condition indicators
{
"timestamp": "2024-05-12T14:22:01.003Z",
"event": "unhandledRejection",
"error": "DB Timeout",
"stack_trace": "truncated",
"concurrent_rejections": 3,
"handler_state": "overwritten"
}
Filter logs for rapid-fire rejection events sharing identical correlation IDs. Missing secondary stack frames confirm unhandledRejection state overwrite.
Minimal Reproduction Case
Provide a deterministic, isolated script that reliably triggers the race condition. Execute two concurrent async functions with guaranteed rejection. Capture handler execution order via microtask queue inspection. Demonstrate shared state corruption without synchronization primitives.
// Minimal repro: Concurrent rejections overwriting shared error state
let sharedError = null;
async function taskA() { throw new Error('DB Timeout'); }
async function taskB() { throw new Error('Cache Miss'); }
async function run() {
try {
await Promise.all([taskA(), taskB()]);
} catch (err) {
sharedError = err; // Race: only the first rejection is captured
}
}
// Attach individual catch handlers to observe microtask collision
taskA().catch(err => console.log('Handler A:', err.message));
taskB().catch(err => console.log('Handler B:', err.message));
Promise.all short-circuits on the first rejection. Individual .catch() handlers on parallel promises race to mutate a global variable, causing unpredictable error context. The sharedError variable captures whichever rejection resolves first in the microtask queue, masking the true failure scope.
Root Cause: Event Loop Timing & Shared State
Promise .catch() callbacks execute in microtask order, not call order. The JavaScript engine schedules these callbacks immediately after the current synchronous stack clears. Synchronous state mutations in async handlers lack atomicity under concurrent execution.
Global error objects become write-conflict zones under high concurrency. Multiple rejection events hit the same memory address before the previous handler finishes serialization. This creates an async error boundary race condition where the final state reflects arbitrary execution timing rather than logical failure order.
Production-Safe Resolution & Workarounds
Deploy immediate, zero-downtime fixes to stabilize error boundaries. Implement AsyncLocalStorage or request-scoped error queues to isolate concurrent execution contexts. Replace Promise.all with Promise.allSettled + explicit error routing to capture all failures. Use AbortController to short-circuit dependent async branches on first failure.
// Atomic error aggregation using a microtask-safe queue pattern
class ErrorAggregator {
#queue = [];
#processing = false;
async add(err) {
this.#queue.push(err);
if (!this.#processing) {
this.#processing = true;
while (this.#queue.length) {
const e = this.#queue.shift();
await logError(e); // Simulated async I/O
}
this.#processing = false;
}
}
}
const aggregator = new ErrorAggregator();
async function runSafe() {
const results = await Promise.allSettled([taskA(), taskB()]);
results.forEach(r => r.status === 'rejected' && aggregator.add(r.reason));
}
Prevents concurrent handler writes by serializing error processing through a boolean lock. Ensures atomic state updates under high concurrency. Each rejection enters the queue immediately, but processing waits for the previous batch to complete.
Common Mistakes
Using process.on('unhandledRejection') to mutate global state
Multiple concurrent rejections fire the listener in unpredictable microtask order. This causes state overwrites and masks the original failure context.
Wrapping parallel async calls in a single try/catch without isolation Fails to capture secondary errors when the first rejection throws. Leaves orphaned promises that trigger silent crashes later in the event loop.
FAQ
Why does my error handler only log one of three concurrent rejections?
Promise.all short-circuits on the first rejection. Use Promise.allSettled or isolate .catch() per promise to capture all failures.
Can I safely use async/await with global error objects in high-throughput services?
No. Global mutable state lacks atomicity under concurrent async execution. Use request-scoped storage or atomic queues instead.
How do I prevent duplicate crash alerts from race-conditioned handlers?
Implement a debounce or deduplication layer at the error aggregation point. Ensure only one listener triggers process.exit() to avoid cascading termination signals.