Introduction
Backtesting is how you find out whether a strategy has an edge before you risk money on it. Done honestly, it’s the single highest-leverage activity a new trader can do. Done badly - and most backtests are done badly - it produces a beautiful equity curve that dies the moment you go live.
This guide is about the second kind: why backtests lie, and how to build tests that don’t. It’s shorter than the others because most of the value is in avoiding the traps, not piling on complexity.
Why Backtest
- Prove the strategy has an edge before risking capital.
- Quantify expectations - win rate, avg win R, avg loss R, drawdown distribution.
- Find the conditions where it breaks - specific regimes, instruments, or timeframes where it fails.
- Build conviction to hold losers when the live trade is a statistically normal loss, not a broken strategy.
Define the Rule, Not a Hope
A testable strategy has an unambiguous entry, stop, and exit. “Buy the dip” is not a rule; it’s a feeling. “Buy when price touches the 20 EMA on the 1H chart with price trending above the 200 EMA, stop 1 ATR below the entry, target 2R” is a rule.
The minimum fields
- 1Entry conditionSpecific, testable. What exact price action or indicator state triggers entry?
- 2Entry fillMarket, limit at level, stop above level? Determines realistic fill.
- 3Stop placementFixed R, ATR-based, structural? Must be mechanical.
- 4Exit ruleFixed target, trailing stop, time-based, condition-based. Must be mechanical.
- 5Filter conditionsTime of day, volume floor, trend filter, volatility filter. What conditions DISQUALIFY an otherwise-matching setup?
Data Quality
- Same timeframe resolution as you’ll trade. Daily data can’t test intraday stop-out behavior.
- Tick data for scalping; 1-minute or 5-minute for day-trading setups; 15m/1H/daily for swing.
- Adjusted for splits, dividends, roll gaps on futures.
- Real bid/ask where possible, not just mid-price. Spreads matter for short-timeframe setups.
- Out-of-session data matters for some strategies (gaps, overnight positioning).
Sample Size
The single most common backtest flaw is n=20 trades over 6 weeks and calling it an edge.
- Minimum
- 100 trades. Below this, the stats are noise.
- Preferred
- 200+ trades. Expectancy stabilizes, drawdown distribution becomes visible.
- Confidence intervals
- A 55% win rate with n=20 could easily be 40% or 70% by chance. Same win rate with n=200 is much tighter.
- Multi-regime
- Sample should span at least one bull market, one bear market, and one chop regime. 2 years minimum, 5 years better.
Survivorship Bias
Backtesting a strategy on today’s S&P 500 components is backtesting winners. The companies that got delisted, went bankrupt, or dropped from the index are gone from your dataset. Your results will look 1–3% better per year than reality.
How to fix
- Use point-in-time index membership data. The tickers that were in the index at each date, not today’s list.
- Include delisted tickers in your universe.
- For single-asset strategies (spot BTC, ES futures), survivorship isn’t an issue - the asset still exists.
- For crypto, survivorship is severe - most tokens that existed 5 years ago are gone. Stick to top-N by market cap AT EACH DATE if testing alts.
Overfitting & Curve-Fitting
Overfitting is when your strategy is optimized to historical data so specifically that it captures noise, not edge. The equity curve looks perfect in-sample and blows up out-of-sample.
Signs you’re overfitting
- The strategy uses very specific parameter values (e.g. 27-period EMA, 1.3% stop) with no economic reason.
- Small changes in parameters dramatically change results.
- The strategy only works on one ticker / one timeframe / one regime.
- You have 20 rules tuned to 500 trades.
- Equity curve is a clean straight line with minimal drawdown.
Look-Ahead Bias
Using information in your backtest that wouldn’t have been available at the time.
Common forms
- Using close price as the entry signal, but filling at that same close. You didn’t know the close until the candle closed.
- Using daily high/low in a decision made at open.
- Using a ticker’s future split-adjustment in a historical calculation.
- Using sector classifications as of today for historical analysis.
- Using pivot points / levels computed after the fact instead of how they’d have been drawn live.
Walk-Forward Testing
The gold-standard backtesting framework. Splits history into a rolling train/test sequence so you’re always testing on data the strategy hasn’t seen.
How it works
- 1Split historyE.g. 2020–2022 as train, 2023 as test. Or rolling 12-month windows.
- 2Optimize on trainTune any parameters on the training window only.
- 3Evaluate on testRun the locked parameters on the untouched test window. That’s your honest performance.
- 4Roll the windowsShift forward, re-optimize, re-test. Build a sequence of out-of-sample results.
- 5Aggregate out-of-sample onlyYour real performance estimate is the concatenation of all test-window results, never the train windows.
Fees & Slippage
Most backtests leave these out. A frictionless backtest is fantasy.
- Commissions: subtract real broker fees per trade.
- Spread cost: for every round trip, subtract the bid-ask spread.
- Slippage on market orders: add 0.05–0.2% cost depending on liquidity and order size.
- Limit order fill assumption: don’t assume every limit fills at mid. Add realistic fill probability.
- Overnight funding (perps): subtract estimated funding for leveraged crypto strategies.
- Borrow cost (shorts): some stocks cost 5%+ annualized to short.
Regime Change
Markets have regimes: trending, ranging, high-vol, low-vol, risk-on, risk-off. A strategy that killed in 2017 (low-vol trend) may be useless in 2022 (high-vol chop). Regime persistence is also finite.
How to test across regimes
- Segment backtest by VIX levels (or BTC implied volatility).
- Segment by trend strength (% above/below 200-day MA).
- Report performance per regime, not just overall.
- If a strategy is regime-dependent, know the regime condition and be willing to stop trading it when the regime shifts.
From Backtest to Paper to Live
- 1Backtest with walk-forwardOut-of-sample results honest, fees/slippage applied.
- 2Forward paper tradeTrade the strategy live, paper only, for 1–3 months. Compare realized results to backtest expectations.
- 3Real money, micro sizeStart live at 25% of target size. Validate the execution environment matches the backtest assumptions.
- 4Scale up slowlyOnly after 50+ live trades at 25% size with stable expectancy. Then bump to 50%, then 75%, then 100%.
- 5Monitor for decayTrack live performance vs backtest expectancy. If it underperforms by 30%+ for 50+ trades, pause and investigate before scaling.
Backtest in TradeSimple
The Backtesting tool lets you replay historical candles and place paper entries/exits as if live:
- Candle-by-candle replay on any instrument and any date range.
- Speed control - 1x, 2x, 5x, 10x.
- Sandboxed trades - don’t touch your live journal.
- Session stats - win rate, R expectancy, equity curve - computed per session.
- Compare sessions - see how the same strategy performs across different months or instruments.