Skip to main content

Neural SDE / Deep Hedging

Every model on this site -- SABR, SVI, Heston -- starts by choosing a formula and then fitting its parameters to data. A Neural SDE flips this: it uses a neural network to learn the formula itself directly from market data. The network discovers the drift and diffusion functions that best explain observed prices, and the vol surface falls out as a byproduct.

💡
The network learns the equation

Classical models say "vol follows this equation" and fit parameters. A Neural SDE says "vol follows some equation" and the network figures out what it is. The implied vol surface is an output of the learned model, not a shape assumed in advance.

See It in Action

Compare how classical, parametric, and neural approaches handle the same market data under different conditions.

Neural SDE vs. Classical Models

Liquid market, well-behaved smile. All three approaches produce similar results.
Classical (SABR)
Hand-picked formula
OTM PutATMOTM Call
Parametric (SVI)
5-param formula
OTM PutATMOTM Call
Neural SDE
Learned from data
OTM PutATMOTM Call
Market data
|
Pick a model (SABR)
|
Fit 4 params
|
Smile
Market data
|
Pick a formula (SVI)
|
Fit 5 params
|
Smile
Market data
|
Neural network
|
Learn drift + diffusion
|
Smile

Toggle scenarios to see how each approach handles different market conditions. In stress and sparse regimes, the neural SDE adapts where parametric models are constrained by their assumed shape.

How It Works

1. Learn the dynamics, not the shape

A standard SDE for price and vol looks like: dS = ... dt + ... dW. Classical models fill in the "..." with specific formulas (SABR uses CEV with stochastic vol-of-vol). A Neural SDE replaces those formulas with neural networks trained on historical data. The network learns both the average behavior (drift) and the randomness (diffusion) from scratch. It can discover skew patterns and term structure shapes that parametric models cannot anticipate.

2. Deep Hedging: learn the hedge, not just the price

Deep Hedging (Buehler, Gonon, Teichmann & Wood, 2019) extends this idea. Instead of pricing an option and then computing a hedge ratio from a model, you train a network to directly output the optimal hedge position at each timestep. The network learns delta and vega exposures jointly. The training objective: minimize hedging P&L variance under real market conditions -- including transaction costs, bid-ask spreads, discrete rebalancing, and liquidity constraints. No frictionless-market assumptions needed.

3. The vol surface emerges

Once the Neural SDE is trained, you can generate the implied vol surface by pricing vanilla options through the learned model. The resulting surface is not constrained to any parametric shape -- it captures whatever patterns exist in the data, including ones that SVI or SABR would structurally miss. Both ATM and OTM regions are fitted simultaneously.

ℹ️
Captures dynamics parametric models miss

Neural SDEs capture vol dynamics that parametric models cannot: regime switches, path-dependent effects, and cross-asset spillovers. Deep Hedging accounts for costs that classical delta-hedging ignores. Data-hungry and computationally expensive, but this is where quant finance is heading.

Strengths and Limitations

Strength
What it means for you
No shape assumption
The network discovers vol dynamics from data. No structural bias from choosing SABR vs Heston vs SVI.
Friction-aware hedging
Deep Hedging accounts for transaction costs, spreads, and discrete rebalancing -- realities that classical models ignore.
Adapts to regime changes
Retrained on recent data, the network adapts to new market behavior without manual model selection.
Captures cross-asset effects
Can learn how BTC vol responds to ETH moves, or how macro events propagate -- multi-input by design.
Limitation
What it means for you
Black box
You cannot inspect why the network produces a given smile shape. Hard to debug when something looks wrong.
Data hungry
Needs large, high-quality historical datasets. Crypto markets may not have enough history for reliable training.
Computationally expensive
Training involves Monte Carlo simulation through a neural network. Not a spreadsheet exercise.
No arbitrage guarantee
Unlike SANOS, the output surface may contain arbitrage unless explicitly constrained during training.
Bleeding edge (2019+)
Active research area. No standardized implementations. Few production deployments outside large quant funds.

Relevance to Crypto

Crypto markets are a natural fit for Neural SDEs because the vol dynamics are poorly understood and change rapidly. There is no consensus on whether BTC vol is better modeled by SABR, Heston, rough vol, or something entirely different. A Neural SDE sidesteps this debate by learning whatever dynamics the data contains -- including Black-Scholes-violating patterns like regime switches. The main obstacle is data: crypto options markets are young and the training set is small compared to equity or rates.

💡
Learned models, learned hedges

Neural SDEs replace hand-picked vol models with learned ones. Deep Hedging replaces theoretical hedge ratios with friction-aware ones. The tradeoff: interpretability, data requirements, and compute cost. For now, research tools -- but they define the frontier.

Equation Explorer

Convert between implied vol, total variance, log-moneyness, and option prices.

Equation Explorer

w = σ2 × Ttotal variance = IV2 × time
%
The implied volatility
days
Calendar days to expiration
Total Variance (w)
0.022225
Annualized Variance (σ²)
0.2704
Round-trip IV
52.00%
Total variance is what SVI and other models fit. It scales with time, so a 50% vol for 30 days has less total variance than 50% vol for 90 days.

Test your understanding before moving on.

Q: What does the neural network in a Neural SDE actually learn?
Q: Why does Deep Hedging produce different hedge ratios than classical delta hedging?
Q: A Neural SDE produces a vol surface that contains a calendar spread arbitrage. What went wrong?

💡 Tip: Try answering each question yourself before revealing the answer.

Building mathematical intuition

Learn Neural SDEs from scratchInteractive lesson · no prerequisites

This lesson explains the "learn the equation" idea in plain English, then walks through how the network learns drift and diffusion functions and where deep hedging fits into the picture.


See also: