Optimization
Beware Perfect Backtests Pt. 1
July 20, 2020

Overview

Creating an automated strategy is an iterative process that involves many smaller processes — and a key piece of the strategy development loop is the testing/validation phase.  An important note — something that was pointed out to me early on, and has stuck with me ever since — is that NONE of these tests are supposed to be an iterative development tool.  A Backtest (or WF) is not a tool for development, it is a tool for validation.  Once you have an entry signal that seems to look good, you can throw it in an initial backtest to VALIDATE that it is worth continuing to develop, and possibly see if there’s some places it’s lacking.  The goal is NOT to create a perfect backtest!  

Before I begin on my own process of testing — let me point out some of the downfalls of the testing phase, beginning with the most common types: Overfitting, Selection Bias, Survivorship Bias, & Lookahead Bias.  One type of bias I didn’t mention that’s pretty much baked into Backtesting is Hindsight Bias — or knowledge of past market regimes and conditions — that can allow us to modify tests to fit the dataset.  This is unavoidable, but that’s the reason for WFO’s, and Realtime testing — nonetheless It’s best to be aware of it so you can catch yourself fitting any past regimes subconsciously.

Overfitting is something that can be done without any ill intention — just run 10 backtests in a row until it looks perfect, suddenly you have overfit the data.  To avoid overfitting, leave SOME RECENT DATA UNTOUCHED, and LIMIT THE NUMBER OF BACKTESTS YOU RUN to keep them to an absolute minimum.  Again, think of them as a validation, a Boolean result — not a development tool or a test to ace.

Selection Bias comes from cherry-picking datasets, or even subconsciously choosing test sets that you know are likely to be uncharacteristically profitable.  We’ll get into this more when we look at Cluster Analysis in Walk Forward Optimizations — there’s an easy visual or mathematical solution.

Survivorship & Lookahead Bias are more of a question of Data integrity and logic — it’s easy to make a mistake on both though.  The classic mistake of Survivor Bias is when you select a Portfolio of securities existing in 2020, but begin a backtest in 1980.  In 1980, there were OTHER companies in the index — but by looking at the performance of existing companies you’re not considering the companies that went to 0 or close with distressed acquisitions.  

Lookahead bias is what it sounds like — peaking at tomorrows close price, usually on accident.  It can also be more unclear — like entering a trade based on an earnings report, not realizing it was not yet public information at the time you’re following the signal in a backtest.  With tools like TradeStation, these aren’t possible (as you can’t access future bars), and all data is point in time (and lacks fundamentals anyway).  There are ways to handle these questions in platforms like QuantConnect and other Python platforms as well (see Data Normalization Modes).

Now that you know what not to do, let me share my own little testing process. It begins with a Backtest, and I mean 1.  Sometimes I’ll leave the inputs as is, and try a group of various sectors, or markets — but I’m not tweaking anything, or looking to improve them, just seeing how something looks in different markets or intervals.  I don’t pay backtests much mind — I make an effort to not get ahead of myself if I see something like a perfect backtest, as there’s a handful of reasons why it’s not a valid representation.  Things like Trailing Stops are notorious for overstating performance in backtests — because they review the data as it happened, they sometimes show losing trades as winners (maybe it immediately went against you, then went in your favor — if it’s a large green bar in a higher interval, chances are if you’re long the backtest will not show a loss here), and will show trailing stops execute at much larger PNL’s than actual live trades.  You’ll find in practice many get stopped out AT the target value, vs backtests where they all run almost perfectly in the direction of the trade until a big retracement against you.  Another factor to include in Backtests is slippage — or the difference in signal price and execution price of your trade (more on that in next exercise) — and commissions — the explicit costs of trading.  Both should be factored into backtests as accurately as possible (if you’re unsure, just aim high — it’s much better to lower these than raise them later).

How To (Backtest):

To run a backtest in TradeStation / Multicharts, simply right click a chart - > Format / Insert Strategy / Study -> Add Strategy -> enter inputs, and close the study window to let it run.  You may want to specify the properties (commissions, slippage), and timeframe — I like to start with a 3yr backtest and go back to 10-15yr.  Then to view it, go to view, strategy performance report (or Ctrl P).

To optimize, in the Format window, click the strategy and in the input ranges select optimize — here you can do genetic, exhaustive, or Walk Forward (More on that later).  If your iteration number is enormous (50000+), I would use a genetic optimization (and be mindful of overfitting!), otherwise just try exhaustive standard backtest optimization. 

For now — you want to focus on your strategies Net Return, Max Drawdown, Avg Trade, Win Rate, Avg Win / Avg Loss, and ideally the Sharpe and or Calmar Ratio.  Also take a look at Equity Curve and periodical returns — make sure it’s something you could live with monitoring. You’re going to be looking for something with a Win Rate and Avg W / L that balances out to a positive expected value — that can either be something with a .35 Avg W:L and a 70% win rate, or a 3.0 Avg W:L and a 35% win rate, that’s more up to personal preference (in my experience, the former shows better Sharpe’s, but you must be careful with them as a losing streak can really hurt you quickly).

Moving on from backtests, I begin trading the signal in simulation (paper) immediately — I start it even with default parameters just because I find realtime data to be the most valid (and the only valid), and I want to begin gathering data. This can be done in any brokerage like TradeStation or Interactive Brokers for free, and it’s well worth the time it takes waiting on it.  I also run a Walk-Forward Optimization, which is important — but again not an end-all-be-all of strategy success — here’s why.

Walk Forwards are an attempt at remedying the issues with a backtest — but I still don’t value them nearly as much as realtime performance.  A Walk Forward is what it sounds like, a backtest that is run in iterations going forward.  It breaks up your data into various parts — Out Of Sample (OOS), and In Sample (IS) — usually based on %, with a default of ~ 20% OOS, and ~80% IS.  The strategy takes your input ranges, and trains them using the 20% of In Sample data, then tests them on the Out of Sample data to validate it performs as well on fresh data.  What you should be watching out for is something that performs significantly worse in OOS ranges. 

 

I prefer to kick up OOS % to ~30-40% in the settings, and make sure it’s almost identical in performance overall.  It should also provide you with the best possible parameters for your system — but this is where selection bias runs rampant.  Make sure you’re not choosing a parameter group that is at a narrow peak in the data — you want a large plateau area.  These values are not going to be perfect at all times in all conditions, they are ever-changing, so you want something that has good surrounding performance as well — another technique is to average the best group of like parameters so you literally take the middle value.  

How To (Walk Forward Optimization):


Remember the Backtest Optimization ? Same thing, just select Walk Forward now — run the same thing again, and make sure you know where it’s saving the file to.  Once you run it, Click View -> Walk Forward Optimizer.  Wait for that to open, now File -> Open Walk Forward File -> navigate to saved file.  You can now run a cluster or single WFA, I would do Cluster if you have the time.  Check the settings on the toolbar for optimization and criteria for passing test, and you can change the IS / OOS % here as well.  Then Click File -> Start Cluster or Standard WF.  I included cluster analysis here, and the full results in Appendix A. 

Once you’ve ran a WF, you can also run a quick Monte Carlo to get an idea of what your Max Drawdown will look like — but more on that later.

Live:

Now that you’ve gotten through Walk Forwards, assuming you haven’t thrown the system out yet — and if you have thrown it out, DO NOT WORRY, it as a FREQUENT occurrence of developing new strategies and becomes less likely — but now it’s time to analyze your realtime simulation results, and maybe start implementing the inputs you got from the WF.  This is the most important testing step in my opinion, as it’s the last time to make cheap (free) changes to your system.  Make sure to let it gather as much data as you can, as it can be useful in portfolio selection and allocation later!   I like to run it for at least 1mo in realtime paper.

Next, is the realtime Live testing (usually with minimal contracts / shares) — this can be nerve racking, so starting small takes the pressure off here.  My advice here is to simply let it run, don’t touch it.  Let it earn it’s wins, and show it’s losses early so you can decide how best to use it in the future. I try to compare it to my MC or WF values, and make sure it’s not completely outside the realm of those numbers — but some reduced Avg W / L is expected especially with Trailing Stops, etc. This is the first time you’re really going to get actual slippage numbers, so you can review how effective your estimates were, and readjust them for future models.  Keep in mind different markets will have much different execution based on the factors mentioned in the next exercise. 

Assuming all is well, you are ready to begin scaling up the system as you see fit — and hopefully find a nice portfolio to add it to, with a solid MM Algo to manage the position size and scale with time.  Congratulations, this is the final stage of the initial strategy development cycle! Now you can move on to the Portfolio level, which is if nothing else much more definitive and laid out (not as ‘creative’).

-Happy Trading!

ZO

Appendix A



In Sample Results

Out Of Sample Results (Walk Forward)

In Sample & Out Of Sample Equity Curve -- No real change == IDEAL
Bad Peak (Orange) vs Plateau (Green -- Ideal Parameter Value)
Max Drawdown Monte Carlo Sim (Useful for determining worst case)

Download Modular IB Strategy

Interactive Brokers Modular IB strategy is ready to automate Futures, Options, Equities & FX via IB API

Sign Up Now
Zach Oakes

I've managed a quant focused Master Fund for 10 years. I'm a full stack developer, working mostly in Python, ELD, C#, and some C++. My specialties include Strategy Development & Portfolio Optimization -- I write about Alpha and ways to find, create, exploit, or improve it. Enjoy!

← All Posts