Do 13F signals actually predict stock returns? We ran the backtest.
If you've ever wondered whether “following the smart money” through SEC 13F filings actually beats the S&P — we did too. So we built the tooling, ran the backtest, and the result was humbling enough that we're publishing it here instead of burying it in a footnote.
TL;DR
Over 221 ticker-quarter pairs across 4 historical quarters (Q4 2024 through Q3 2025), the correlation between our ConvictionScore — a composite of smart-money positioning, insider activity, manager track record, multi-quarter trend, concentration, and contrarian bonus — and realized forward alpha over SPY was Pearson r = −0.12. In every single quarter of the window, top-decile “BUY” signals underperformed bottom-decile “SELL” signals by more than 20 percentage points. The score does not predict returns over this window. This changes nothing about what HoldLens tracks — just what we claim the tracking does.
1. What we tested
HoldLens assigns every tracked stock a single ConvictionScore from −100 (strongest smart-money consensus SELL) to +100 (strongest BUY). The score aggregates six signals — smart-money consensus, insider Form 4 activity, manager 10-year track record, multi-quarter trend streaks, position concentration, and a contrarian bonus — minus dissent and crowding penalties. It's the same score that drives every BUY/SELL ranking on the site.
The question: does the score predict which stocks will outperform going forward? To answer it, we:
- Replayed the score at four historical filing dates (Q4 2024, Q1 2025, Q2 2025, Q3 2025) for every ticker in our tracked universe.
- Fetched 2-year daily closing prices from Yahoo Finance for every ticker plus SPY as a benchmark.
- Computed realized forward return from each ticker's 13F filing date to today, alongside SPY's return over the same window.
- Paired each (ticker, quarter) with its score at that quarter and its realized alpha (ticker return minus SPY return over the same window).
- Computed per-quarter and aggregate Pearson correlations, decile analysis, and hit-rate statistics.
The full script is scripts/backtest-conviction.ts — deterministic, reproducible, runnable anytime with npx tsx scripts/backtest-conviction.ts. 221 (ticker, quarter) pairs made it through both the scoring and pricing filters. Not thousands, but enough for directional correlation analysis.
2. What we found
Per-quarter correlations between ConvictionScore and realized alpha:
| Filing date | Window held | SPY return | r(score, alpha) |
|---|---|---|---|
| Q4 2024 (14 Feb 2025) | 428d | +16.5% | −0.04 |
| Q1 2025 (15 May 2025) | 338d | +20.9% | −0.18 |
| Q2 2025 (14 Aug 2025) | 247d | +10.1% | −0.16 |
| Q3 2025 (14 Nov 2025) | 155d | +5.7% | −0.06 |
| Aggregate | — | — | −0.12 |
Every single quarter had a negative correlation. The aggregate is −0.12. For context, a score that genuinely predicted returns would show r ≥ +0.15 at minimum; financial research considers r ≥ +0.3 strong evidence of a signal.
Binned by score:
| Score bucket | N | Mean return | Mean alpha vs SPY |
|---|---|---|---|
| Sell (−29 to −10) | 23 | +35.4% | +22.3% |
| Weak sell (−9 to −1) | 34 | +24.3% | +9.7% |
| Weak buy (+1 to +9) | 58 | +25.8% | +13.9% |
| Buy (+10 to +29) | 90 | +8.5% | −4.3% |
| Strong buy (≥ +30) | 9 | +16.3% | +3.1% |
The biggest bucket by size — our standard “BUY” category, 90 signals — underperformed SPY by 4.3 percentage points. The “SELL” bucket beat SPY by 22 percentage points. The score ordering was, over this window, inverse to the outcome.
3. Why
Three structural reasons, not mutually exclusive:
Contrarian inversion
Famous value investors buy stocks that have dropped. The headline “Buffett added to Occidental” often comes while OXY is down 15% from its high — that's why he's buying. In short horizons (6-14 months), momentum beats mean-reversion in most sectors; stocks that are falling keep falling. When smart money is selling, they're typically taking profit on a winner — which often keeps winning. The BUY and SELL labels get the short-term direction backwards in an up market, which is what we had.
Manager-quality drag
Several of the industry's most-covered managers have underperformed the S&P over the last decade. Our derived 10-year ROI panel on each investor page tells this story: the legendary name is not the legendary recent performer. When those managers' picks drive the BUY signal, the signal inherits their recent record — which hasn't beaten the index.
The 45-day lag
By the time we can see “Manager X bought stock Y” in a 13F, the quarter has been over for at least 45 days, often longer. Whatever informational edge the manager had at purchase time is stale. The market is efficient enough that the mean-reversion or momentum pattern has had weeks to play out. See the 45-day lag explained for more.
4. So what is 13F data good for?
The backtest doesn't say 13F data is worthless. It says the common claim — “follow the smart money and you'll outperform” — isn't supported by data. What 13F data legitimately is:
- Transparency. Regulators require 13F so the public can see who owns what at the end of each quarter. That's useful civic infrastructure.
- Market color. Knowing that Buffett exited a position, or that Ackman is accumulating a new name, is genuinely informative — it tells you the holding exists, the magnitude, the conviction relative to their book.
- Research input. 13F filings are a great watchlist generator. “What does Druckenmiller think is interesting right now?” is a reasonable starting point for deep-dive research.
- Overlap + divergence analysis. When three value investors independently buy the same name, that's a stronger signal than any one of them buying alone — even if the absolute magnitude of the signal is still modest.
What it's NOT: a stock-picker. It's positional data, not predictive data.
5. What this means for HoldLens
We've re-framed the product. Ranking pages no longer say “what to buy” — they say “what tracked superinvestors are buying.” The ConvictionScore is positioned as a smart-money positioning tracker, not a predictor. Every ranking page shows a methodology link back to this backtest, so no user sees the score without seeing the caveat.
We're also re-running the backtest each quarter as new price data accumulates. If the correlation materially changes, we'll update this article and publish v2 of the methodology. Transparency over flattery is the thesis. Full script: scripts/backtest-conviction.ts, full output: .claude/state/CONVICTION_BACKTEST.md, both in the public HoldLens repo.
6. How to use HoldLens going forward
Use it for what it is:
- Check what specific managers you respect are buying/selling. The individual investor pages (/investor) show each manager's portfolio with their own 10-year ROI in the header — so you can see whether the manager has earned the right to influence your thinking.
- Use portfolio similarity to find investors you haven't heard of who run books resembling managers you trust.
- Use Form 4 insider activity to see which corporate insiders are buying their own stock. Insider buys — real CEO dollar, not 10b5-1 schedule — are the one smart-money signal with widely-documented positive alpha in the academic literature. Not featured in the ConvictionScore backtest but present in the data.
- Use the comparison tools to look at overlap between managers on specific tickers. Consensus across multiple independent books is more informative than any single position.
And please — don't trade off ConvictionScore alone. Do your own research. The backtest is the evidence that we mean it.
HoldLens is not a registered investment advisor. Nothing on this site is investment advice. Always do your own research. If you find errors in the backtest, email [email protected] — corrections are logged publicly with a timestamp.