TradeSimple
1 / 13
Start journaling free
Free Public Guide

Backtesting Guide

Most backtests lie. Here's how to backtest honestly: sample size, survivorship bias, walk-forward testing, and the pitfalls that make results look better than they are.

13 chapters · ~18 min read

Chapter 1

Introduction

Backtesting is how you find out whether a strategy has an edge before you risk money on it. Done honestly, it’s the single highest-leverage activity a new trader can do. Done badly - and most backtests are done badly - it produces a beautiful equity curve that dies the moment you go live.

This guide is about the second kind: why backtests lie, and how to build tests that don’t. It’s shorter than the others because most of the value is in avoiding the traps, not piling on complexity.

Chapter 2

Why Backtest

  • Prove the strategy has an edge before risking capital.
  • Quantify expectations - win rate, avg win R, avg loss R, drawdown distribution.
  • Find the conditions where it breaks - specific regimes, instruments, or timeframes where it fails.
  • Build conviction to hold losers when the live trade is a statistically normal loss, not a broken strategy.
The secondary benefit
Backtesting forces you to define the rule mechanically. Half the time, the act of writing down the rule precisely reveals it isn’t a rule - it’s a vague impression of one.
Chapter 3

Define the Rule, Not a Hope

A testable strategy has an unambiguous entry, stop, and exit. “Buy the dip” is not a rule; it’s a feeling. “Buy when price touches the 20 EMA on the 1H chart with price trending above the 200 EMA, stop 1 ATR below the entry, target 2R” is a rule.

The minimum fields

  1. 1
    Entry condition
    Specific, testable. What exact price action or indicator state triggers entry?
  2. 2
    Entry fill
    Market, limit at level, stop above level? Determines realistic fill.
  3. 3
    Stop placement
    Fixed R, ATR-based, structural? Must be mechanical.
  4. 4
    Exit rule
    Fixed target, trailing stop, time-based, condition-based. Must be mechanical.
  5. 5
    Filter conditions
    Time of day, volume floor, trend filter, volatility filter. What conditions DISQUALIFY an otherwise-matching setup?
Chapter 4

Data Quality

  • Same timeframe resolution as you’ll trade. Daily data can’t test intraday stop-out behavior.
  • Tick data for scalping; 1-minute or 5-minute for day-trading setups; 15m/1H/daily for swing.
  • Adjusted for splits, dividends, roll gaps on futures.
  • Real bid/ask where possible, not just mid-price. Spreads matter for short-timeframe setups.
  • Out-of-session data matters for some strategies (gaps, overnight positioning).
Chapter 5

Sample Size

The single most common backtest flaw is n=20 trades over 6 weeks and calling it an edge.

Minimum
100 trades. Below this, the stats are noise.
Preferred
200+ trades. Expectancy stabilizes, drawdown distribution becomes visible.
Confidence intervals
A 55% win rate with n=20 could easily be 40% or 70% by chance. Same win rate with n=200 is much tighter.
Multi-regime
Sample should span at least one bull market, one bear market, and one chop regime. 2 years minimum, 5 years better.
Chapter 6

Survivorship Bias

Backtesting a strategy on today’s S&P 500 components is backtesting winners. The companies that got delisted, went bankrupt, or dropped from the index are gone from your dataset. Your results will look 1–3% better per year than reality.

How to fix

  • Use point-in-time index membership data. The tickers that were in the index at each date, not today’s list.
  • Include delisted tickers in your universe.
  • For single-asset strategies (spot BTC, ES futures), survivorship isn’t an issue - the asset still exists.
  • For crypto, survivorship is severe - most tokens that existed 5 years ago are gone. Stick to top-N by market cap AT EACH DATE if testing alts.
Chapter 7

Overfitting & Curve-Fitting

Overfitting is when your strategy is optimized to historical data so specifically that it captures noise, not edge. The equity curve looks perfect in-sample and blows up out-of-sample.

Signs you’re overfitting

  • The strategy uses very specific parameter values (e.g. 27-period EMA, 1.3% stop) with no economic reason.
  • Small changes in parameters dramatically change results.
  • The strategy only works on one ticker / one timeframe / one regime.
  • You have 20 rules tuned to 500 trades.
  • Equity curve is a clean straight line with minimal drawdown.
The test
After tuning on the first 3 years, apply to the last year UNTOUCHED. If performance holds, the edge is probably real. If it collapses, you curve-fit.
Chapter 8

Look-Ahead Bias

Using information in your backtest that wouldn’t have been available at the time.

Common forms

  • Using close price as the entry signal, but filling at that same close. You didn’t know the close until the candle closed.
  • Using daily high/low in a decision made at open.
  • Using a ticker’s future split-adjustment in a historical calculation.
  • Using sector classifications as of today for historical analysis.
  • Using pivot points / levels computed after the fact instead of how they’d have been drawn live.
Chapter 9

Walk-Forward Testing

The gold-standard backtesting framework. Splits history into a rolling train/test sequence so you’re always testing on data the strategy hasn’t seen.

How it works

  1. 1
    Split history
    E.g. 2020–2022 as train, 2023 as test. Or rolling 12-month windows.
  2. 2
    Optimize on train
    Tune any parameters on the training window only.
  3. 3
    Evaluate on test
    Run the locked parameters on the untouched test window. That’s your honest performance.
  4. 4
    Roll the windows
    Shift forward, re-optimize, re-test. Build a sequence of out-of-sample results.
  5. 5
    Aggregate out-of-sample only
    Your real performance estimate is the concatenation of all test-window results, never the train windows.
Chapter 10

Fees & Slippage

Most backtests leave these out. A frictionless backtest is fantasy.

  • Commissions: subtract real broker fees per trade.
  • Spread cost: for every round trip, subtract the bid-ask spread.
  • Slippage on market orders: add 0.05–0.2% cost depending on liquidity and order size.
  • Limit order fill assumption: don’t assume every limit fills at mid. Add realistic fill probability.
  • Overnight funding (perps): subtract estimated funding for leveraged crypto strategies.
  • Borrow cost (shorts): some stocks cost 5%+ annualized to short.
Frictionless backtest
If your backtest shows +20% annualized without fees/slippage, assume live will be +5–10%. Many edges die entirely once you honestly account for friction.
Chapter 11

Regime Change

Markets have regimes: trending, ranging, high-vol, low-vol, risk-on, risk-off. A strategy that killed in 2017 (low-vol trend) may be useless in 2022 (high-vol chop). Regime persistence is also finite.

How to test across regimes

  • Segment backtest by VIX levels (or BTC implied volatility).
  • Segment by trend strength (% above/below 200-day MA).
  • Report performance per regime, not just overall.
  • If a strategy is regime-dependent, know the regime condition and be willing to stop trading it when the regime shifts.
Chapter 12

From Backtest to Paper to Live

  1. 1
    Backtest with walk-forward
    Out-of-sample results honest, fees/slippage applied.
  2. 2
    Forward paper trade
    Trade the strategy live, paper only, for 1–3 months. Compare realized results to backtest expectations.
  3. 3
    Real money, micro size
    Start live at 25% of target size. Validate the execution environment matches the backtest assumptions.
  4. 4
    Scale up slowly
    Only after 50+ live trades at 25% size with stable expectancy. Then bump to 50%, then 75%, then 100%.
  5. 5
    Monitor for decay
    Track live performance vs backtest expectancy. If it underperforms by 30%+ for 50+ trades, pause and investigate before scaling.
Chapter 13

Backtest in TradeSimple

The Backtesting tool lets you replay historical candles and place paper entries/exits as if live:

  • Candle-by-candle replay on any instrument and any date range.
  • Speed control - 1x, 2x, 5x, 10x.
  • Sandboxed trades - don’t touch your live journal.
  • Session stats - win rate, R expectancy, equity curve - computed per session.
  • Compare sessions - see how the same strategy performs across different months or instruments.