Summary
An engineering team built a Q&A feature powered by multiple LLM providers. The idea was simple: ask the same question across several models, then combine the answers. In practice, it led to timeouts, partial failures, and inconsistent responses. By shifting to event-driven orchestration, the team introduced proper fan-out and fan-in patterns, gained observability, and handled failures gracefully. And all without needing a workflow engine.
The problem: multi-LLM coordination looks simple, until it isn’t
The product needed to query several LLMs in parallel and combine the results into a single output. This allowed the system to hedge against hallucinations, compare answers, and build trust with users.
But the architecture was fragile:
- One model timing out delayed the whole request
- Model providers had inconsistent response behaviours
- Retry logic was scattered across services
- Logs were incomplete, and debugging was frustrating
- Sometimes users got only part of an answer with no explanation
Every integration worked in isolation. The coordination layer did not.
What the team tried first
The engineers explored a few common approaches:
- Async task queues to fan out requests
- Polling mechanisms to check for completion
- Custom retry logic per model
- State machines written in backend code
- Database updates as a coordination layer
Each attempt increased complexity. None of them fixed all of the core issues. The system lacked a reliable way to coordinate asynchronous tasks, handle partial results, and preserve visibility into what had succeeded or failed.
The shift: orchestration as an event pattern
The team moved to event-based orchestration using Sailhouse.
Here’s how the flow changed:
- When a user asked a question, a query_requested event was emitted
- This triggered separate subscribers for each LLM provider
- As models responded, they published response_received events
- A coordinating function listened for a complete set of responses or a timeout
- Once done, an aggregate_answer event was emitted
Every step was observable. Failures were captured. Partial results could still trigger a response.
Why not use a workflow engine?
The team considered tools like Temporal or Inngest. These platforms offer strong guarantees and stateful workflows. But they didn’t fit the use case.
- Model providers changed frequently, so the flow wasn’t static
- Behaviour was driven by model availability, latency, and previous spend
- Defining workflows in advance wasn’t practical
- The system needed flexibility and independence between agents
An event-driven model allowed agents to be loosely coupled. The coordination lived in the event graph, not in hardcoded workflows.
Python for models, events for scale
The agents themselves were built in Python. That was a good fit for working with APIs and models.
But the team quickly found that trying to manage coordination, retries, and state within Python services didn’t scale. They had to deal with:
- Hanging threads and timeout inconsistencies
- Race conditions between concurrent responses
- Logs split across services with no end-to-end trace
- Recovery logic that was hard to test or reuse
Moving coordination into an event-based system gave them structure, reliability, and better fault handling — while letting them keep Python where it made sense.
What changed
- Response times improved by over 50 percent, since the system no longer waited for the slowest model
- Failures were retried or degraded gracefully, not silently dropped
- Engineers could debug issues based on complete event histories
- New models could be added with no architectural changes
- The team stopped writing custom orchestration code
Why it matters
Multi-LLM systems are becoming more common. The challenge isn’t generating responses. It’s coordinating responses reliably, at scale, and with clear fallback behaviours.
This team avoided overbuilding. They used events to handle concurrency, retries, and failure detection without turning their backend into a workflow engine. The result was a system that could evolve as their needs changed; and a team that could focus on product development instead of orchestration logic.