Research  /  May 2026 · Signal test

Cluster buying: tested, reduced, and replaced with something better

Published 2026-05-08 · Test window 2025-04-01 → 2026-04-01 · N = 3,537 filings · ← All research

TL;DR

We tested whether multiple insiders buying the same stock in the same week — historically called a “cluster buy” and weighted heavily in our scoring model — actually predicts forward returns. The answer surprised us. Cluster buys (3–4 insiders) didn’t outperform solo buys at any window or any horizon. And clusters of 5+ insiders significantly underperformed — by nearly 7 percentage points at 90 days, with p < 0.0001, robust across every cut of the data. Production scoring has been updated.

This is the second research piece we’ve published. The first (new-exec signal) found nothing. This one found something — just not what we expected.

Why we ran this test

Our production scoring model gave +25 points to every insider in a 3+ insider cluster, and +10 each to paired buys (2 insiders). That weighting came from conventional wisdom: multiple insiders agreeing is a stronger signal than a lone buyer. It’s the kind of thing every insider-trading writeup repeats.

We’d already reduced cluster weights once before — from +40 down to +25 — after a backfill analysis suggested the original weight was too aggressive. But “too aggressive” isn’t the same as “wrong direction.” This test was supposed to validate that the current weights were calibrated correctly. We pre-registered the methodology and decision rules before pulling any data.

The decision rule was binary: if bucketed cluster vs. solo spread at 90 days was ≥ +3pp at p < 0.05, keep the weights. If it came in below that, reduce or kill.

It came in at −3.72pp at p = 0.034. That’s the wrong sign. The signal isn’t underpowered — it’s pointing the other way.

Methodology

The pre-registration document, decision rules, and full results are committed to the repo and timestamped before the analysis ran.

Cohort sizes

WindowSoloPairedClusterMega-cluster
3 days2,194610389344
7 days1,981649507400
14 days1,824692587434
30 days1,701687678471

Total filings analyzed: 3,537. Cluster cohort cleared the pre-registered N=50 sanity floor at every window.

The headline result

At the production-relevant window (N=7 days), 90-day excess return vs. SPY:

CohortnMean excessMedian excessWin ratep vs. solo
Solo 1,355 +1.27pp −4.77pp 39.9%
Paired 460 +4.72pp −7.32pp 35.0% 0.21
Cluster 341 −2.29pp −3.79pp 46.6% 0.11
Mega-cluster 311 −5.67pp −6.64pp 38.9% <0.0001

Three things to notice.

First, paired buys (exactly 2 insiders) are the best cohort by mean excess return. They’re not statistically significant at the 5% level, but the direction is consistent across every window tested (+2.87 to +4.27pp at d90 depending on N). That matches the intuition that “two people independently agreeing” is a real signal.

Second, the relationship between cluster size and forward returns reverses past 2 insiders. It’s not noisy — it’s monotonic. Adding more insiders past two makes returns worse, not better:

paired       +3.46pp
solo          0.00pp  (baseline)
cluster      -3.55pp
mega_cluster -6.94pp

Third, mega-cluster is robustly negative. Every window, every horizon, every test we ran agrees. At d90: −6.54pp at N=3 (p<0.001), −6.94pp at N=7 (p<0.0001), −5.81pp at N=14 (p=0.001), −6.38pp at N=30 (p<0.001). It’s not regime-specific to one window definition.

The bucketed comparison

The pre-registered primary test was bucketed cluster vs. solo at d90, controlling for market cap and buy value. Sixteen cells, requiring ≥ 10 observations per cohort to be usable.

The two cells doing most of the work:

Market capBuy valuen_solon_cluster mean_solomean_clusterspreadp
Micro Small ($10K–$100K) 392 115 +5.03pp −3.92pp −8.95pp 0.012
Small Medium ($100K–$500K) 110 58 +0.78pp −10.13pp −10.91pp 0.009

The micro-cap small-buy cell is significant on its own. So is the small-cap medium-buy cell. Both push in the same direction.

There’s one cell pushing the other way (large × small, +19.70pp, p=0.023, n=120) — but it’s a large-cap cell where the solo arm itself returned −21.52pp, dragged down by the broader large-cap underperformance our scorecard already documents. Hard to read that as cluster signal.

Pooled across all usable cells, weighted by sample size: −3.72pp at p=0.034.

The regression

To address the worry that bucketing is too coarse and might be missing within-cell variation, we ran a regression on the same data:

excess_return_d90 = β₀ + β₁·paired + β₂·cluster + β₃·mega_cluster
                    + β₄·log(market_cap) + β₅·log(buy_value)
                    + β₆·micro_cap_dummy + ε
TermCoeft-statp-value
(Intercept)+0.3972.530.011
paired+0.0280.980.32
cluster−0.032−1.410.16
mega_cluster −0.095 −5.34 <0.0001
log(market_cap)−0.027−4.45<0.0001
log(buy_value)+0.0181.830.067
micro_cap_dummy−0.058−2.020.043

n = 2,425. Adjusted R² is small (1.8%) — we don’t expect a regression of insider-trade-level returns to explain a lot of variance, and that’s fine. What matters here is the sign and significance of the cohort coefficients.

The mega-cluster coefficient survives controls: an ~9.5pp underperformance after controlling for size, market cap, and the small-cap effect. The cluster coefficient is negative but not significant on its own. The paired coefficient is positive but not significant.

Composition: who’s in the cluster matters

For non-solo cohorts, we tagged each cluster’s role composition. Three categories:

At N=7 d90:

CompositionnMean excessMedianWin rate
Officer-only104+1.22pp−7.42pp47.1%
Mixed419−2.49pp−4.24pp43.7%
Director-only 129 −12.62pp −7.86pp 37.2%

Director-only clusters are the worst-performing sub-cohort in the dataset. Officer-only clusters are roughly flat. Mixed clusters fall in between.

We’re not yet ready to claim “director-only clusters are a contrary indicator” as a standalone production rule — the sample is smaller and we haven’t isolated it from sector or other confounds. But it’s the next thing to look at.

Adding more insiders past two makes returns worse, not better.

What we changed in production

Effective with the next pipeline run after this writeup is published:

  1. Cluster bonus (3–4 insiders) — removed. No score change. The signal is still surfaced in the editorial layer (“multiple insiders bought”) but doesn’t move the score.
  2. Mega-cluster (5+ insiders) — added as a penalty. −20 per filing in the cluster. This is the first time we’ve added a negative conviction signal of this kind. It deserves a contrary-indicator label in the editorial copy.
  3. Paired (2 insiders) — held at +10. Direction-positive but not significant. We’re holding the weight pending an independent retest on a second year of data.

Picks scored before the deploy date used prior weights. Picks scored after use updated weights. The scorecard will show both regimes.

What this doesn’t tell us

A few honest limitations.

One year, one regime. 2025-04 to 2026-04 covers a single market environment. The mega-cluster contrarian effect could be regime-specific. We’re queueing an independent retest on a second year of data when it’s available.

Cluster definition is mechanical. Our cohorts count distinct CIKs, nothing else. We don’t weight by seniority, dollar amount, or whether the insiders all bought on the same day vs. spread across the window. A more nuanced definition might produce different results.

Mega-cluster mechanism is unclear. We don’t have a clean explanation for why 5+ insiders piling in within a week predicts underperformance. Plausible stories: capitulation buying near the bottom of a longer decline, signaling activity around a corporate event the market then re-prices, or simple “everyone read the same memo” herding. We’re treating the result as descriptive, not yet causal.

The director-only finding needs its own test. The −12.62pp result is striking but uncontrolled. Could be sector-driven, could be specific to a few large clusters that dominate the cohort, could be real. Separate research.

What we’d do differently

The pre-registered decision rule worked exactly as it was supposed to: it forced a binary call before we saw the data, and it produced a finding we wouldn’t have arrived at by squinting at the numbers. The thing we didn’t anticipate was that the most interesting result wouldn’t be in the primary test (cluster vs. solo) but in the cohort we treated as exploratory (mega-cluster). Worth remembering for the next test.

Files

Get the signals daily

Free daily email that scores SEC Form 4 insider trading filings and surfaces the highest-conviction signals every morning. Scoring model is published, methodology is open, every pick is tracked in public on the scorecard.

Subscribe free