Optimization
Don't Build a Better Backtest!
July 21, 2020

Build A Better Algo Backtest?

Ask any experienced system developer about backtests, and he’ll probably give you an exasperated look.  On one hand, he’ll say, backtests are great because they can show if a trading idea has any historical merit.  On the other hand, he’ll counter, many times backtests tell you little or nothing about future profitability.  To an experienced trader, then, backtests are both a blessing and a curse.

People new to trading, however, rarely see the duality in backtests.  On the contrary, they see a world of historical profit in optimized tests, a sea of dollars just waiting for them in live trading.  They honestly expect the historical performance to continue well into the future.  A few failed strategieslater, though, the trader usually laments “why does my terrific backtest always fall apart once I start live trading?”

So what is the big problem with traditional backtesting?  Before we examine that, it is instructive to define exactly what a backtest is, and what alternatives exist.  These are shown in figure 1, assuming testing ends on December 31, 2018.  First, there is the traditional backtest.  This is by far the most popular test method, and also the most dangerous.  Most trading software encourages this type of test.  Simply pull up a chart, insert a strategy, and optimize all the parameters, with all available data.  The best result of the optimization is then what should be traded.  This method is very dangerous for most traders.

​For many, bad experiences with a traditional backtest will lead to the next variation: backtest with an out of sample evaluation period.  Instead of testing over the whole data history, the trader will test over the first 50-80% of the data, leaving the rest of the historical data untouched.  The performance of the optimized system during this out of sample period will then be evaluated.  This is a much better way when compared to traditional backtesting, although many people test and retest, which in effect converts the out of sample period to an in sample period.

A step beyond out of sample testing is walkforward testing.  With this method, a longer out of sample period can be created.  This approach is favored by many professional traders, although it can also become tainted through repeated testing.

A final method of testing is to simply start trading, with no historical testing.  This is the truest method, since the test is in real time, with real money.  But, it can take an extremely long time to evaluate if the strategy is actually profitable or not, and it can also be expensive, since most trading ideas are not profitable.  The traders who succeed with this method likely have well formed trading strategies, based on years of experience, that they “pre qualify” before going live.  There is no ambiguity in real time results, though, unlike all types of backtests.

As you can see, each of the three alternative methods of testing is more difficult than the simple “plug and chug” traditional backtesting method.  Therefore, most people, especially newer traders, just stick with the easiest method.   Traditional backtests be very dangerous, though.  Why? Because many people start to believe that improving the backtest is the goal of testing – that a better backtest is always desirable.  An example trading system can easily show that is not the case.

Let’s take a new trader “John.”  John wants to develop a strategy for the Gold market, using data from 2012-2016 (5 years of test data).  So, he pulls up a chart of Gold daily bars in his software.  He has learned from many books and websites that a moving average crossover system is a basic, and many times effective, trading system.  So, he programs it into his trading software:

​For many, bad experiences with a traditional backtest will lead to the next variation: backtest with an out of sample evaluation period.  Instead of testing over the whole data history, the trader will test over the first 50-80% of the data, leaving the rest of the historical data untouched.  The performance of the optimized system during this out of sample period will then be evaluated.  This is a much better way when compared to traditional backtesting, although many people test and retest, which in effect converts the out of sample period to an in sample period.

A step beyond out of sample testing is walkforward testing.  With this method, a longer out of sample period can be created.  This approach is favored by many professional traders, although it can also become tainted through repeated testing.

A final method of testing is to simply start trading, with no historical testing.  This is the truest method, since the test is in real time, with real money.  But, it can take an extremely long time to evaluate if the strategy is actually profitable or not, and it can also be expensive, since most trading ideas are not profitable.  The traders who succeed with this method likely have well formed trading strategies, based on years of experience, that they “pre qualify” before going live.  There is no ambiguity in real time results, though, unlike all types of backtests.

As you can see, each of the three alternative methods of testing is more difficult than the simple “plug and chug” traditional backtesting method.  Therefore, most people, especially newer traders, just stick with the easiest method.   Traditional backtests be very dangerous, though.  Why? Because many people start to believe that improving the backtest is the goal of testing – that a better backtest is always desirable.  An example trading system can easily show that is not the case.

Let’s take a new trader “John.”  John wants to develop a strategy for the Gold market, using data from 2012-2016 (5 years of test data).  So, he pulls up a chart of Gold daily bars in his software.  He has learned from many books and websites that a moving average crossover system is a basic, and many times effective, trading system.  So, he programs it into his trading software:

Of course, he utilizes the optimization feature, and uses it to optimize for the variable mavg, the moving average length.  After running the optimization of 49 iterations, he gets a best Net Profit equity curve, shown as System A in Figure 2.  That is clearly not good enough to trade, so John embarks on a backtest improvement project.

​For system B, John decides that long and short markets will act differently, so the moving average lengths for long trades and short trades should be different.  When he adds in this optimizable parameter, the number of iterations increases to 1,681, and his performance greatly increases (shown as System B in Figure 2).

Now John is feeling good about this system.  But, he want even better performance.  For System C, he adds in another moving average, which he also optimizes.  Now he has 8,405 iterations to optimize over. Not surprisingly, he is ecstatic when the equity curve looks much better.  This is shown as System C in Figure 2.

However, even this performance is not enough for John.  So, he adds another rule to his strategy, this time to exit after a certain number of bars.  Of course, he does not know what value to use for this new rule, so he decides to optimize.  Another optimization (now there are 19,404 iterations), another improvement!  John now has the equity curve shown as System D in Figure 2.

At this point, John congratulates himself.  He has turned a barely profitable moving average strategy into a historically great looking strategy.  But has he really created a better system?  He obviously has generated a more impressive historical backtest, but does that mean anything?  Does better historical performance translate to better real time performance?

Unfortunately for John, and unfortunately for most people who develop strategies this way, adding rules to create a better backtest does not mean the performance in real time will be any better.  In fact, many times, improving the backtest actually makes the real time performance worse.

To see this, let’s examine the “real time” performance of each of John’s 4 strategies.  Since John’s backtest was only until the end of 2016, we can examine what happened to his 4 strategies during 2017-8.  This is shown in Figure 3 and the table below.  As you can see, the better performing backtests actually have worse performance with the real time unseen data of 2017-8.  Thus, by focusing on making the backtest better, John actually made things a lot worse!

​This is unfortunately a common occurrence.  Many traders think they are doing the right thing by improving the backtest, when they are actually just hurting themselves.  While this is not always true – sometimes adding rules to a strategy improves both the backtest and real time performance – a trader always needs to be aware of this possibility.  Here are some tips to overcome this tendency to improve the backtest:

Set realistic expectations. Don’t try to create a perfect looking equity curve.  Real strategies sometimes have severe drawdowns, and many flat periods.  If your backtest results look too good to be true, the strategy probably will not work going forward.

Don’t keep adding rules and iterations just to improve the backtest performance.  Remember, “past performance is not indicative of future results.”

Consider an alternative method of testing. Out of sample, walkforward and real money testing are all highly superior to traditional backtesting.  Consider if one or more of these methods is appropriate for you.

Many traders find historical testing to be indispensable. It allows them to analyze different strategies and see which have held up over time.  While it does not mean profitable performance will continue, it is reassuring to trade a method with a profitable history.  The problem comes about when the trader tries too hard to create a better history.  Many times, improving the backtest leads to the opposite effect in real time, i.e. worse real time performance.  Therefore, a trader always has to be careful when developing a strategy, and resist the urge to build a better backtest.

-Kevin Davey

Original Article: https://kjtradingsystems.com/do-not-build-better-backtest.html

Source: http://www.kjtradingsystems.com

Download Modular IB Strategy

Interactive Brokers Modular IB strategy is ready to automate Futures, Options, Equities & FX via IB API

Sign Up Now
Kevin Davey

Kevin Davey is an award-winning private futures, forex, and commodities trader. He has been trading for over 25 years. Three consecutive years, Kevin achieved over 100% annual returns in a real-time, real money, year-long trading contest, finishing in first or second place each of those years. Kevin is the author of the highly acclaimed algorithmic trading ​book "Building Algorithmic Trading Systems: A Trader's Journey From Data Mining to Monte Carlo Simulation to Live Trading" (Wiley 2014). Kevin provides a wealth of trading information at his website: http://www.kjtradingsystems.com

← All Posts