18 Jun 2026 · 2 min read

Why most backtests lie — and what in-sample vs out-of-sample actually fixes

#backtesting#validation#methodology

Every strategy looks profitable on the data it was built on. That is not a sign of an edge — it is a near-mathematical certainty. The moment you tune a parameter to past prices, you have begun fitting the strategy to noise as much as to signal, and the backtest curve stops being a forecast and starts being a memory.

The standard defence against this is the in-sample / out-of-sample split, and it is worth understanding what it does and does not buy you.

The split, plainly

You divide your history into two parts. The in-sample period is where you are allowed to look, tune, and optimise. The out-of-sample period is sealed off — you do not touch it until your parameters are locked. When you finally run the locked strategy on that untouched data, you are asking a single honest question: does the edge survive on prices it was never fitted to?

If performance holds up, you have weak evidence of a real effect. If it collapses, you have strong evidence that you were trading your own curve-fit.

Why "weak evidence" is the best you get

A single out-of-sample pass is not proof. You can still get lucky on one slice of history, and a strategy that survives one split can fail on another regime entirely. This is why a clean OOS result should lower your confidence in failure more than it raises your confidence in success.

What it reliably catches is the worst case: the strategy that exists only because of overfitting. Those die immediately out of sample, and killing them early is most of the value.

Where it commonly goes wrong

Leaking the OOS period. If you ever glance at out-of-sample results and then go back to retune, it is no longer out of sample. You have simply made the whole dataset in-sample with extra steps.
Splitting too small. An OOS window with only a handful of trades tells you almost nothing. If the edge concentrates in five trades, you are reading noise.
Ignoring regime. A strategy validated only across a trending stretch has not been tested against chop. Whenever possible, your OOS window should contain market conditions your in-sample period did not.

A practical bar

Before a strategy earns real capital, it should clear three things: a locked parameter set, an out-of-sample result that is directionally consistent with in-sample rather than merely positive, and enough trades on both sides that the numbers are not hostage to a handful of outliers. None of this guarantees forward performance — but it removes the strategies that never had a chance.

The uncomfortable truth is that good validation mostly produces rejections. That is the point. A process that rarely says no is not protecting you from anything.

Want a strategy like the ones discussed here built and validated? See the service →