Backtesting Guide - TradeSimple

Chapter 1

Introduction

Backtesting is how you find out whether a strategy has an edge before you risk money on it. Done honestly, it’s the single highest-leverage activity a new trader can do. Done badly - and most backtests are done badly - it produces a beautiful equity curve that dies the moment you go live.

This guide is about the second kind: why backtests lie, and how to build tests that don’t. It’s shorter than the others because most of the value is in avoiding the traps, not piling on complexity.

Chapter 2

Why Backtest

Prove the strategy has an edge before risking capital.
Quantify expectations - win rate, avg win R, avg loss R, drawdown distribution.
Find the conditions where it breaks - specific regimes, instruments, or timeframes where it fails.
Build conviction to hold losers when the live trade is a statistically normal loss, not a broken strategy.

The secondary benefit

Backtesting forces you to define the rule mechanically. Half the time, the act of writing down the rule precisely reveals it isn’t a rule - it’s a vague impression of one.

Chapter 3

Define the Rule, Not a Hope

A testable strategy has an unambiguous entry, stop, and exit. “Buy the dip” is not a rule; it’s a feeling. “Buy when price touches the 20 EMA on the 1H chart with price trending above the 200 EMA, stop 1 ATR below the entry, target 2R” is a rule.

The minimum fields

1
Entry condition
Specific, testable. What exact price action or indicator state triggers entry?
2
Entry fill
Market, limit at level, stop above level? Determines realistic fill.
3
Stop placement
Fixed R, ATR-based, structural? Must be mechanical.
4
Exit rule
Fixed target, trailing stop, time-based, condition-based. Must be mechanical.
5
Filter conditions
Time of day, volume floor, trend filter, volatility filter. What conditions DISQUALIFY an otherwise-matching setup?

Chapter 4

Data Quality

Same timeframe resolution as you’ll trade. Daily data can’t test intraday stop-out behavior.
Tick data for scalping; 1-minute or 5-minute for day-trading setups; 15m/1H/daily for swing.
Adjusted for splits, dividends, roll gaps on futures.
Real bid/ask where possible, not just mid-price. Spreads matter for short-timeframe setups.
Out-of-session data matters for some strategies (gaps, overnight positioning).

Chapter 5

Sample Size

The single most common backtest flaw is n=20 trades over 6 weeks and calling it an edge.

Minimum: 100 trades. Below this, the stats are noise.
Preferred: 200+ trades. Expectancy stabilizes, drawdown distribution becomes visible.
Confidence intervals: A 55% win rate with n=20 could easily be 40% or 70% by chance. Same win rate with n=200 is much tighter.
Multi-regime: Sample should span at least one bull market, one bear market, and one chop regime. 2 years minimum, 5 years better.

Chapter 6

Survivorship Bias

Backtesting a strategy on today’s S&P 500 components is backtesting winners. The companies that got delisted, went bankrupt, or dropped from the index are gone from your dataset. Your results will look 1–3% better per year than reality.

How to fix

Use point-in-time index membership data. The tickers that were in the index at each date, not today’s list.
Include delisted tickers in your universe.
For single-asset strategies (spot BTC, ES futures), survivorship isn’t an issue - the asset still exists.
For crypto, survivorship is severe - most tokens that existed 5 years ago are gone. Stick to top-N by market cap AT EACH DATE if testing alts.

Chapter 7

Overfitting & Curve-Fitting

Overfitting is when your strategy is optimized to historical data so specifically that it captures noise, not edge. The equity curve looks perfect in-sample and blows up out-of-sample.

Signs you’re overfitting

The strategy uses very specific parameter values (e.g. 27-period EMA, 1.3% stop) with no economic reason.
Small changes in parameters dramatically change results.
The strategy only works on one ticker / one timeframe / one regime.
You have 20 rules tuned to 500 trades.
Equity curve is a clean straight line with minimal drawdown.

The test

After tuning on the first 3 years, apply to the last year UNTOUCHED. If performance holds, the edge is probably real. If it collapses, you curve-fit.

Chapter 8

Look-Ahead Bias

Using information in your backtest that wouldn’t have been available at the time.

Common forms

Using close price as the entry signal, but filling at that same close. You didn’t know the close until the candle closed.
Using daily high/low in a decision made at open.
Using a ticker’s future split-adjustment in a historical calculation.
Using sector classifications as of today for historical analysis.
Using pivot points / levels computed after the fact instead of how they’d have been drawn live.

Chapter 9

Walk-Forward Testing

The gold-standard backtesting framework. Splits history into a rolling train/test sequence so you’re always testing on data the strategy hasn’t seen.

How it works

1
Split history
E.g. 2020–2022 as train, 2023 as test. Or rolling 12-month windows.
2
Optimize on train
Tune any parameters on the training window only.
3
Evaluate on test
Run the locked parameters on the untouched test window. That’s your honest performance.
4
Roll the windows
Shift forward, re-optimize, re-test. Build a sequence of out-of-sample results.
5
Aggregate out-of-sample only
Your real performance estimate is the concatenation of all test-window results, never the train windows.

Chapter 10

Fees & Slippage

Most backtests leave these out. A frictionless backtest is fantasy.

Commissions: subtract real broker fees per trade.
Spread cost: for every round trip, subtract the bid-ask spread.
Slippage on market orders: add 0.05–0.2% cost depending on liquidity and order size.
Limit order fill assumption: don’t assume every limit fills at mid. Add realistic fill probability.
Overnight funding (perps): subtract estimated funding for leveraged crypto strategies.
Borrow cost (shorts): some stocks cost 5%+ annualized to short.

Frictionless backtest

If your backtest shows +20% annualized without fees/slippage, assume live will be +5–10%. Many edges die entirely once you honestly account for friction.

Chapter 11

Regime Change

Markets have regimes: trending, ranging, high-vol, low-vol, risk-on, risk-off. A strategy that killed in 2017 (low-vol trend) may be useless in 2022 (high-vol chop). Regime persistence is also finite.

How to test across regimes

Segment backtest by VIX levels (or BTC implied volatility).
Segment by trend strength (% above/below 200-day MA).
Report performance per regime, not just overall.
If a strategy is regime-dependent, know the regime condition and be willing to stop trading it when the regime shifts.

Chapter 12

From Backtest to Paper to Live

1
Backtest with walk-forward
Out-of-sample results honest, fees/slippage applied.
2
Forward paper trade
Trade the strategy live, paper only, for 1–3 months. Compare realized results to backtest expectations.
3
Real money, micro size
Start live at 25% of target size. Validate the execution environment matches the backtest assumptions.
4
Scale up slowly
Only after 50+ live trades at 25% size with stable expectancy. Then bump to 50%, then 75%, then 100%.
5
Monitor for decay
Track live performance vs backtest expectancy. If it underperforms by 30%+ for 50+ trades, pause and investigate before scaling.

Chapter 13

Backtest in TradeSimple

The Backtesting tool lets you replay historical candles and place paper entries/exits as if live:

Candle-by-candle replay on any instrument and any date range.
Speed control - 1x, 2x, 5x, 10x.
Sandboxed trades - don’t touch your live journal.
Session stats - win rate, R expectancy, equity curve - computed per session.
Compare sessions - see how the same strategy performs across different months or instruments.