The proof

Did the recommender actually work?

For each historical 13F filing date, we compute what HoldLens would have recommended using only the data available at that point in time. Then we measure the realized return from that day to today using live prices.

No survivorship bias, no curation, no cherry-picking. If the model picked stocks that lost money, this page shows it. Trust comes from being right when nobody is looking.

Read this first · how we test the model fairly

Every backtested quarter has ≥3 prior quarters of trend data

ConvictionScore v3 weights multi-quarter trend streaks heavily — a manager building a position for 3 consecutive quarters is a much stronger signal than a single-quarter move. In v0.23 we extended the dataset to 8 historical quarters (Q1 2024 → Q4 2025) so the backtest can test the model under its actual operating conditions:

Q1-Q3 2024: used only as context for the trend engine (not backtested directly)
Q4 2024: model has 3 prior quarters — first quarter fairly backtestable
Q1 2025: model has 4 prior quarters — fully operational
Q2 2025: model has 5 prior quarters — strong trend signal
Q3 2025: model has 6 prior quarters — peak trend signal

The rule: a quarter is only included in the backtest if the model had at least 3 quarters of prior data available to score with. That's the same condition today's /best-now ranking operates under. No handicap, no excuses.

Earlier 2024 quarters exist in the dataset but aren't backtested as entry points — they would themselves lack enough prior quarters to score fairly. v0.24 will extend coverage to 2022-2023 so we can test the model across a full bull-bear cycle.

Loading realized returns from Yahoo Finance...

Computing how each historical pick has performed since its filing date

Methodology

How the backtest works

As-of conviction: For each historical quarter (Q4 2024, Q1/Q2/Q3 2025), we compute the ConvictionScore using ONLY moves filed up to that quarter. Time decay is re-anchored so the historical "latest" quarter has weight 1.0. Every backtested quarter has ≥3 prior quarters of trend context (Q1-Q3 2024 serve as the warmup dataset).
Top 5 net-accumulating positions: We take the top 5 stocks ranked BUY at that historical point in time. No curation — whatever the model said.
Entry price: The closing price closest to the 13F filing date (when an investor could have actually acted on the signal).
Exit price: Today's live price from Yahoo Finance via our Cloudflare Worker proxy.
Realized return: Simple (exit − entry) / entry. Not annualized for the per-pick rows; the aggregate annualizes using days held.
Benchmark: SPY total return over the same period. Hit rate = % of picks that beat SPY.
Equal weight: Each pick weighted 1/N. No position sizing, no rebalancing. The simplest possible strategy.

Caveats

Small sample size: 4 quarters × 5 picks = 20 data points. Statistically meaningful inference would need 20+ quarters of data. v0.24 extends coverage further back.
Insider data is not time-locked:The model uses current Form 4 data even for historical computations. Minor look-ahead bias on the insider component.
Owner count is not time-locked:The crowding penalty uses today's ownership count, not the historical count. Minor.
No transaction costs: Real returns would be slightly lower due to spreads + commissions (though most brokers are commission-free in 2026).
Past ≠ future: A model that worked historically can stop working tomorrow. Past performance is not indicative of future results.

Backtest data is recomputed live on every page load. Returns shift with the market each day. Not investment advice. Full methodology →