As a CS student, I wanted to test whether the same system architectures we use in software engineering could be applied to investment research with real rigor.
Orchestrators. Parallel workers. Evaluation and critique loops. Same underlying patterns, different domain.
So, using Perplexity Computer, I built a 12-agent system to pitch a stock.
Five agents. One question:
What did ELV’s stock do on January 28th, 2026?
The answers ranged from -6.7% to +5.9%. The correct answer was +5.86% close-to-close. Two agents got the direction wrong.
That disagreement was not a failure. It is exactly what the system is designed to surface.
The Problem with Single-Agent Research
Most people use AI like a faster Google: one prompt, one answer, move on.
The issue is structural.
Language models are probabilistic. The same prompt can produce different outputs because each run samples from a distribution over possible tokens. That is usually fine for casual use. But in financial research, it is dangerous.
A wrong number does not stay isolated. It propagates into your model, into your peer comparisons, and into your price target. By the time you notice, the entire thesis can be built on a faulty assumption.
With a single agent, you get one sample. With multiple independent agents, you get a distribution of answers, and the variance tells you exactly where uncertainty exists.
There is a second issue too: narrow framing. One agent, or one analyst, tends to reinforce its own assumptions. Contradictory evidence often never shows up because it was never searched for in the first place.
The Architecture
I built a six-stage pipeline with 12+ agents. Each stage has a specific job, and outputs are gated before moving to the next stage.
At a high level:
Strategy → Hypothesis Generation → Verification → Modeling → Critique → Output
Under the hood, each stage is powered by multiple specialized agents.

Multi-Agent Investment Research Pipeline
-
Strategy (Orchestrator)
Defines the research plan and key questions. -
Hypothesis Generation (Parallel Agents)
Multiple agents explore different explanations for the stock in parallel: macro, narrative, behavioral, and structural. The goal here is not agreement. It is coverage. -
Truth Verification (Ensemble System)
Five independent runs cross-check 26 discrete facts. When agents agree, confidence increases. When they disagree, it triggers an investigation. Every conflict is resolved using primary sources such as SEC filings, transcripts, or raw market data. This is where the system becomes reliable. -
Financial Modeling (Quant Layer)
DCF, SOTP, and structural analysis translate insights into valuation. -
Adversarial Critique (Red Team)
Specialized agents are designed to challenge the thesis, not validate it. They look for flawed assumptions, missing risks, and overstated claims. -
Output (Product)
The final result is compiled into an interactive thesis app.
Under the hood, the system behaves like a directed pipeline. Each phase produces structured output that must be reviewed before the next stage begins.
What the System Found
The company is Elevance Health (formerly Anthem), ticker ELV.
The mispricing is specific: the market is treating ELV like a pure-play Medicare Advantage company during a regulatory pressure cycle. But that framing is incomplete.
A significant portion of ELV’s business comes from Carelon, its health services platform, which has been growing rapidly and now represents a substantial share of total revenue. This segment is less visible in the headline narrative, but it changes how the business should be evaluated.
Two insights that emerged from the multi-agent process:
- DCF back-solve. One of the agents produced a precise estimate of the negative long-term growth implied by the current price. That number could not be cleanly verified. Which is exactly the point. Without a verification layer, that figure would have made it into the thesis as fact. Instead, it was flagged, investigated, and reduced to a directional conclusion: at current prices, the market is effectively pricing in little to no long-term growth.
- FIDE-SNP positioning advantage. ELV’s Medicaid footprint across multiple states gives it stronger access to fully integrated dual-eligible plans. While competitors like Humana and Centene participate in dual-eligible programs, ELV appears better positioned to integrate Medicaid and Medicare services at scale. This creates a structural advantage that is not immediately visible in standard peer comparisons.
Finding these insights was not the hard part. Making sure they were actually true was.
What the Critic Layer Caught
The thesis underwent multiple rounds of adversarial review before publication. These were not superficial checks. They found real issues.
- The Bear Analyst pointed out that I treated key risks as independent. They were actually correlated.
- A CMS rate timeline error was corrected using primary sources.
- The claim about Humana’s exclusion from FIDE-SNPs was too strong and was revised.
- Settlement figures were verified against DOJ releases.
The system did not just generate ideas. It forced corrections.
The Lesson
The most valuable thing about multi-agent systems is not that they produce answers. It is that they disagree.
A single agent gives you an output. Multiple agents give you a distribution, visible uncertainty, and clear signals on where to investigate.
The real work happens in resolving those disagreements using primary sources.
The system is built around a simple assumption: the first answer is a hypothesis, not a conclusion.