blog record — Apr 2026
Trade Reasoning Agent
An LLM that explains the why behind politician and insider trades, with a second pass that tries to break its own answer.
What it does
Quiver, Capitol Trades, and Unusual Whales already list politician trades. None of them answer the question a portfolio manager actually cares about: why, and is it any good?
The naive version is easy to get wrong in three ways. Ask Claude open-endedly "why did Senator X buy NVDA on March 3?" and you get a clean paragraph about AI tailwinds that is mostly invented. Add retrieval and the top results for politician plus ticker queries are SEO farms that get laundered into "analyst sentiment." Worst, if the news query pulls from after the filing date the model writes a thesis that "predicts" what already happened. Most casual LLM-trading demos accidentally do that. The pipeline is mostly defenses against those three.
System overview
The hard rule, drawn as the dashed line in Fig 1. Anything above it uses only data with available_at <= t_filing. Below it can peek at future prices, because that's the backtester's job.
Three feeds
EDGAR Form 4 and PTRs. Form 4 covers corporate insiders and lands within two business days[1]. Congressional PTRs are messier. STOCK Act[4] says 30 days, hard ceiling 45. Most arrive at the ceiling, sometimes later, often as scanned PDFs. A parser normalizes both into a shared Filing doc in Firestore with two timestamps: t_filing (when it became public) and t_trade (when it executed). Single most important pair of values in the project.
Exa for news. Bing News is cheaper and Google Programmable Search has more coverage. I picked Exa because the neural search lets me query semantically ("evidence NVDA datacenter demand was strengthening before 2026-03-03") and its published_before filter actually works[2]. On top of that I keep a per-domain trust score: Reuters, Bloomberg, WSJ, FT, and company 8-Ks score high. Seeking Alpha contributor posts sit in the middle. Content farms score zero and get filtered out.
SEC 8-Ks and earnings transcripts. Free, structured, and usually the actual cause of any interesting institutional trade. Ranked above news when both are available.
Thesis generation
The thesis prompt is the most-iterated artifact in the codebase. Early versions asked Claude[3] open-endedly to "explain why this trade likely happened." Pretty prose, no structure. The current prompt does four things.
1. Forces JSON against a strict schema so downstream code can score theses. 2. Provides evidence first and the question last, so Claude sees the Exa context block before the filing. Small change, big drop in ticker-anchored confabulation. 3. Requires inline citation IDs for every claim in evidence[]. No citation, auto-reject before the critic sees it. 4. Asks for a falsifiable counter-signal: "what would have to be true for this thesis to be wrong?"
{
"filing_id": "PTR-2026-03-03-XYZ",
"ticker": "NVDA",
"side": "BUY",
"thesis": {
"primary_driver": "DATACENTER_DEMAND_INFLECTION",
"summary": "Filer increased NVDA exposure ahead of expected Q1 datacenter revenue beat...",
"evidence": [
{ "claim": "Hyperscaler capex guidance raised", "cite": "ex_004", "weight": 0.4 },
{ "claim": "Supply constraint easing per 8-K", "cite": "ex_011", "weight": 0.3 }
],
"confidence": 0.62,
"horizon_days": 45,
"counter_signal": "If hyperscaler capex commentary on next earnings reverses, thesis is invalidated."
},
"context_window": { "earliest": "2026-01-15", "latest_inclusive": "2026-03-03" }
}The thesis schema Claude must emit.
latest_inclusive is the contract with the backtester. Any evidence with a published_at later than that timestamp and the whole thesis gets dropped.
The self-check loop
First version was a single critic prompt: "Here is a thesis. Score it 1 to 10." It scored everything a 7.
The current critic is structured as an adversary. Its job is to break the thesis, not evaluate it. Does every evidence item have a citation that actually loads? Is any evidence from a domain with trust under 0.5? Is the thesis circular, meaning does it cite the filing itself or news that only exists because of the filing? Is the counter_signal observable or a tautology? Is confidence calibrated against the weight-sum of evidence?
def self_check(thesis, context):
critique = claude.complete(
system=CRITIC_SYSTEM_PROMPT,
user=render_critic_prompt(thesis, context),
response_format="json",
)
if critique["circularity_flag"]:
return Reject("circular: thesis derived from filing-induced coverage")
weak = [e for e in thesis["evidence"]
if domain_trust(e["cite"]) < 0.5]
if len(weak) / max(len(thesis["evidence"]), 1) > 0.34:
return Reject("evidence majority from low-trust domains")
if not critique["counter_signal_is_observable"]:
return Reject("counter-signal not falsifiable")
if abs(thesis["confidence"] - critique["recomputed_confidence"]) > 0.25:
return Revise(suggested_confidence=critique["recomputed_confidence"])
return Accept(score=critique["adversary_score"])The critic gate. Reject early, fail loud.
About 38% get rejected outright, another ~15% are sent back for one revision, and the remaining ~47% reach the paper trader. Pre-critic hit rate was 51%. Post-critic, 58%.
Filing-date-aware backtesting
Politicians disclose late. Form 4 insiders disclose less late but still not in real time. Use any information dated after t_filing, including price action between t_trade and t_filing, and the backtest is contaminated and the paper returns are fiction.
The guard, applied to every feature x_i used to construct or score a thesis:
τ(x_i) is when feature x_i first became public. The second clause is the definition of disclosure lag, true by construction. The first is enforced in three places. Evidence pinning: every Exa query carries published_before = t_filing, so later docs get dropped before Claude sees them. Price-feature lag: rolling indicators (e.g. 20-day momentum) compute on [t_filing - 20d, t_filing], never on [t_trade - 20d, t_trade], since the latter is 30 days of free lookahead. Entry simulation: simulated entry is the open on the next trading day after t_filing, not the politician's fill price. What a real follower could have done.
Paper-trading results
Over a 90-day paper window (Jan to Mar 2026):
+12.0% vs SPY at +4.1%. Sharpe ~1.6. Hit rate 58% on 141 closed positions. Average hold 31 days, set by the thesis' horizon_days, with early exit when the counter_signal triggers. Largest drawdown was 6.4%, mostly a cluster of accepted theses around a regional bank that was wrong about the rate-cut path.
The strategy was meaningfully positive on committee-aligned trades (Armed Services members trading defense names) and roughly flat on broad-market index trades. Alpha sits in informational asymmetry, not in copying directional bets[5].
Senate trades did not outperform House trades, which is the opposite of the conventional "Pelosi Tracker" framing. In my window, House PTRs that survived the critic had slightly higher hit rates. I don't have a clean explanation. Best guess is selection, since more House filers means more independent signals after filtering.
What I'd do differently
Both passes are Claude, so they share priors. I want to run the critic on a non-Anthropic model and watch rejection rates shift. Position sizing is equal-weight capped at 2% NAV; a Kelly-style sizer keyed off the critic's recomputed_confidence is the obvious next step, but I don't trust my confidence calibration yet. About 4% of PTRs come through as scanned PDFs from older filers' offices and my parser drops them, which probably hides some of the most interesting trades. And 90 days isn't a backtest, it's a demo. I want at least two years of out-of-sample data before any real money goes near this, which means a historical Exa-snapshot corpus. That's its own project.
/ footnotes
- [1]SEC Form 4 filing requirements and the two-business-day rule: sec.gov/forms/form4data and EDGAR full-text search at efts.sec.gov. ↩
- [2]Exa neural search with date filters and content retrieval: docs.exa.ai/reference/search. ↩
- [3]Anthropic Claude API, JSON-mode and tool use patterns used for the thesis schema and critic loop: docs.anthropic.com/structured-outputs. ↩
- [4]STOCK Act disclosure rules and PTR timing requirements (House and Senate): ethics.house.gov and ethics.senate.gov. ↩
- [5]Ziobrowski et al., "Abnormal Returns from the Common Stock Investments of the U.S. Senate," Journal of Financial and Quantitative Analysis. Foundational paper on politician-trade alpha and the basis for the "informational asymmetry" framing. jstor.org. ↩