Why most backtests lie — and what in-sample vs out-of-sample actually fixes
Every strategy looks profitable on the data it was built on. That is not a sign of an edge — it is a near-mathematical certainty. The moment you tune a parameter to past prices, you have begun fitting the strategy to noise as much as to signal, and the backtest curve stops being a forecast and starts being a memory.
The standard defence against this is the in-sample / out-of-sample split, and it is worth understanding what it does and does not buy you.
The split, plainly
You divide your history into two parts. The in-sample period is where you are allowed to look, tune, and optimise. The out-of-sample period is sealed off — you do not touch it until your parameters are locked. When you finally run the locked strategy on that untouched data, you are asking a single honest question: does the edge survive on prices it was never fitted to?
If performance holds up, you have weak evidence of a real effect. If it collapses, you have strong evidence that you were trading your own curve-fit.
Why "weak evidence" is the best you get
A single out-of-sample pass is not proof. You can still get lucky on one slice of history, and a strategy that survives one split can fail on another regime entirely. This is why a clean OOS result should lower your confidence in failure more than it raises your confidence in success.
What it reliably catches is the worst case: the strategy that exists only because of overfitting. Those die immediately out of sample, and killing them early is most of the value.
Where it commonly goes wrong
- Leaking the OOS period. If you ever glance at out-of-sample results and then go back to retune, it is no longer out of sample. You have simply made the whole dataset in-sample with extra steps.
- Splitting too small. An OOS window with only a handful of trades tells you almost nothing. If the edge concentrates in five trades, you are reading noise.
- Ignoring regime. A strategy validated only across a trending stretch has not been tested against chop. Whenever possible, your OOS window should contain market conditions your in-sample period did not.
A practical bar
Before a strategy earns real capital, it should clear three things: a locked parameter set, an out-of-sample result that is directionally consistent with in-sample rather than merely positive, and enough trades on both sides that the numbers are not hostage to a handful of outliers. None of this guarantees forward performance — but it removes the strategies that never had a chance.
The uncomfortable truth is that good validation mostly produces rejections. That is the point. A process that rarely says no is not protecting you from anything.
Want a strategy like the ones discussed here built and validated? See the service →