Forecasting Option Premiums with Deep Learning

The mathematical equations holding our financial system together are wrong.

Well… not entirely wrong, but even the most well-studied and time-tested derivative pricing models (such as the Black-Scholes model) will provide theoretical prices that just don’t agree with prices in reality. While the debate on the predictability of stock prices rages on, it is a well-known fact that option prices are predictable when provided a spectrum of possible future prices for the underlying stock. The Black-Scholes equation gives an analytical solution for the price of an option given the price of the underlying stock. Despite this, the real prices of contracts on option exchanges greatly differ from what the Black-Scholes model would suggest. If you’ve ever traded options, you’ll have quickly realized that the prices of your option contracts are often vastly different from the set of potential prices you had forecast mere days before. Why is this? Let’s find out.

The Black-Scholes model price for a call option at time t, C_t , is defined as:

$$C_t = S_t N(d_1) - Ke^{-r(T-t)}N(d_1)$$

with d₁ and d₂ defined as

$$d_1 = \frac{ln(\frac{S_t}{K})+(r+\frac{\sigma ^2}{2})(T-t)}{\sigma \sqrt{T-t}}$$

$$d_2 = d_1 - \sigma \sqrt{T-t}$$

Where N is the cumulative distribution function of the standard normal distribution, S_t is the current price of the underlying stock, T-t is the time until expiry, σ is the volatility of the underlying stock, and r is the risk-free rate. The code follows the same form:

# Returns Black-Scholes call price given forward price, strike, time to maturity, risk-free rate, and volatility
def call_price(fprice, k, ttm, r, v):
        d1 = (np.log(fprice / k) + (r + (v**2)/2) * ttm) / (v * np.sqrt(ttm))
        d2 = d1 - (v * np.sqrt(ttm))
        N = lambda x: stats.norm.cdf(x)
        return fprice * N(d1) - k*np.exp(-r*ttm) * N(d2)

In this analysis, we will use the fed rate as the risk-free rate, and we'll combine a 1, 2, and 3 month window for calculating volatility inspired by earlier work on option pricing with learning networks (Hutchinson et al., 1994). After acquiring and preprocessing 15 years worth of market data on SPY options, we are ready to explore the data:

Here, I have forecast prices with the Black-Scholes model and plotted them against real future bid prices. The green dotted line represents a perfect forecast.

Note the dark purple region around the origin of the above figure, indicating that people will gladly pay non-zero premiums to hedge against catastrophic market crashes (this price is the skew risk premium). In the eyes of the Black-Scholes model, a crash of such magnitude is not likely enough to warrant a non-zero premium, since stock returns are assumed to be normally distributed. Stock returns are not normally distributed (Natenberg, 1994). This is an example of how the Black-Scholes model fails to incorporate kurtosis risk. Interestingly enough, Dr. Scholes co-founded a hedge fund that collapsed after only four years due to this very flaw.

This plot shows the prediction error of price forecasts using the Black-Scholes model.

I think the key insight here is that the Black-Scholes model is bad at forecasting option premiums far into the future. In other words, it is pretty good at telling us what an option should be worth today, but it is quite bad at telling us what an option should be worth in one year.

Let's make better forecasts.

My goal here is to make a model which outperforms the Black-Scholes model’s forecasts - in other words - we want to flatten the error distribution in the above plot. I think an obvious place to start is to add future-looking variables to the model. For example, the BSM makes the assumption that volatility is constant over time. We can remove this assumption by adding implied volatility (the market’s expectation of future volatility) into our model. Assuming the market is somewhat efficient, this assumption should lead to more accurate forecasts. To do this, we’ll use the CBOE Volatility Index. Another idea is to use multiple values for the risk-free rate, since different market participants have access to different risk-free rates. We will use a combination of the fed rate, the 3-month U.S. treasury bill yield, and the LIBOR. Maybe the option contract’s volume and open interest have something to do with its price… we’ll throw them into the model too.

Once the we’ve prepared and normalized this data, we’ll define and train a neural network:

# Divide data
n = X.shape[0]
X_train, Y_train = X[0:int(0.5*n), :], Y[0:int(0.5*n), :]
X_val, Y_val = X[int(0.5*n):int(0.6*n), :], Y[int(0.5*n):int(0.6*n), :]

# Define the model
model = Sequential()
model.add(Dense(1024, input_shape = (X.shape[1], )))
model.add(Activation('elu'))
model.add(Dropout(0.1))
model.add(Dense(2048))
model.add(Activation('elu'))
model.add(Dropout(0.1))
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.1))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(Y.shape[1]))

# Train model
model.compile(optimizer = keras.optimizers.adam(lr = 1e-5), loss='mean_absolute_error')
hist = model.fit(X_train, Y_train, batch_size = 32, validation_data = (X_val, Y_val), epochs = 100)

Using exponential activations between each layer gave me the best results. I assume this made it easier for the network to learn a pricing model, since the price formula in the BSM is proportional to a cumulative distribution function as well as an exponential function of time (as shown in the first equation). Let’s look at the performance:

The neural network makes better price forecasts, most notably for deeply out-of-the-money options. This shows that, unlike the Black-Scholes equation, the neural network can account for kurtosis risk much better and thus gives us accurate non-zero s… — The neural network makes better price forecasts, most notably for deeply out-of-the-money options. This shows that, unlike the Black-Scholes equation, the neural network can account for kurtosis risk much better and thus gives us accurate non-zero skew risk premium. The neural network gives a Pearson correlation coefficient of 0.998 compared to the BSM’s measly 0.992 when compared to real option prices. The neural network gives forecasts that are, on average, 2.9x more accurate.

While the neural network offers lower error for most forecasts, it performs worse when pricing options far from maturity. In his analysis of computing spot prices for options using machine learning, Daniel Stafford solved a similar problem by creati… — While the neural network offers lower error for most forecasts, it performs worse when pricing options far from maturity. In his analysis of computing spot prices for options using machine learning, Daniel Stafford solved a similar problem by creating three separate models for out-of-the-money, near-the-money, and in-the-money options (Stafford, 2018).

What insights can we take away from this?

While it seems like the neural network is significantly better at forecasting option premiums, there are a few caveats. Firstly, the Black-Scholes model was intended to price European-style options which are slightly different in nature to the American-style options this analysis looks at - this could cause the BSM to underestimate option premiums, since owning an American-style option is always more valuable than holding its European-style equivalent. Another caveat is the speed at which we can forecast. I assume a large chunk of firms interested in forecasting option premiums on-the-fly are high frequency trading firms - for whom calculation speed matters more than accuracy. As you could have guessed, a closed-form expression such as the Black-Scholes equation can be run much faster than a ~5 million parameter neural network (by my calculation, 218x faster). We really can’t conclude that this model is any “better” than the Black-Scholes model.

After coming to this conclusion, I assumed that this technology would be much better suited to forecast premiums for complex assets such as credit default swaps or more exotic option contracts for which closed-form solutions don’t exist. After further investigation, it seems that machine learning methods perform similarly to state-of-the-art pricing models (Gündüz & Uhrig-Homburg, 2011).

Well, in that case:

Forecasting Option Premiums with Deep Learning

The mathematical equations holding our financial system together are wrong.

Let's make better forecasts.

What insights can we take away from this?

The mathematical equations holding our financial system together are good enough.

Simulating Evolution With "Jellies" and Neural Networks