Neural SDE / Deep Hedging

Q: What does the neural network in a Neural SDE actually learn?

It learns the drift and diffusion functions of the stochastic differential equation governing the asset price and/or volatility. These are the functions that classical models specify by hand (e.g., CEV backbone in SABR, mean-reverting vol in Heston).

Q: Why does Deep Hedging produce different hedge ratios than classical delta hedging?

Classical delta hedging assumes continuous rebalancing in a frictionless market. Deep Hedging trains on realistic conditions: discrete rebalancing, transaction costs, bid-ask spreads, and liquidity constraints. The optimal hedge under friction is generally not the Black-Scholes delta.

Q: A Neural SDE produces a vol surface that contains a calendar spread arbitrage. What went wrong?

Neural SDEs do not guarantee arbitrage freedom by construction (unlike SANOS). The training may not have penalized this violation, or the training data was insufficient. Explicit no-arbitrage constraints must be added to the loss function or enforced post-hoc.

Every model on this site -- SABR, SVI, Heston -- starts by choosing a formula and then fitting its parameters to data. A Neural SDE flips this: it uses a neural network to learn the formula itself directly from market data. The network discovers the drift and diffusion functions that best explain observed prices, and the vol surface falls out as a byproduct.

💡

The network learns the equation

Classical models say "vol follows this equation" and fit parameters. A Neural SDE says "vol follows some equation" and the network figures out what it is. The implied vol surface is an output of the learned model, not a shape assumed in advance.

See It in Action

Compare how classical, parametric, and neural approaches handle the same market data under different conditions.

Neural SDE vs. Classical Models

Liquid market, well-behaved smile. All three approaches produce similar results.

Classical (SABR)

Hand-picked formula

Parametric (SVI)

5-param formula

Neural SDE

Learned from data

Market data

Pick a model (SABR)

Fit 4 params

Smile

Market data

Pick a formula (SVI)

Fit 5 params

Smile

Market data

Neural network

Learn drift + diffusion

Smile

Toggle scenarios to see how each approach handles different market conditions. In stress and sparse regimes, the neural SDE adapts where parametric models are constrained by their assumed shape.

How It Works

1. Learn the dynamics, not the shape

A standard SDE for price and vol looks like: dS = ... dt + ... dW. Classical models fill in the "..." with specific formulas (SABR uses CEV with stochastic vol-of-vol). A Neural SDE replaces those formulas with neural networks trained on historical data. The network learns both the average behavior (drift) and the randomness (diffusion) from scratch. It can discover skew patterns and term structure shapes that parametric models cannot anticipate.

2. Deep Hedging: learn the hedge, not just the price

Deep Hedging (Buehler, Gonon, Teichmann & Wood, 2019) extends this idea. Instead of pricing an option and then computing a hedge ratio from a model, you train a network to directly output the optimal hedge position at each timestep. The network learns delta and vega exposures jointly. The training objective: minimize hedging P&L variance under real market conditions -- including transaction costs, bid-ask spreads, discrete rebalancing, and liquidity constraints. No frictionless-market assumptions needed.

3. The vol surface emerges

Once the Neural SDE is trained, you can generate the implied vol surface by pricing vanilla options through the learned model. The resulting surface is not constrained to any parametric shape -- it captures whatever patterns exist in the data, including ones that SVI or SABR would structurally miss. Both ATM and OTM regions are fitted simultaneously.

ℹ️

Captures dynamics parametric models miss

Neural SDEs capture vol dynamics that parametric models cannot: regime switches, path-dependent effects, and cross-asset spillovers. Deep Hedging accounts for costs that classical delta-hedging ignores. Data-hungry and computationally expensive, but this is where quant finance is heading.

Strengths and Limitations

Strength

What it means for you

No shape assumption

The network discovers vol dynamics from data. No structural bias from choosing SABR vs Heston vs SVI.

Friction-aware hedging

Deep Hedging accounts for transaction costs, spreads, and discrete rebalancing -- realities that classical models ignore.

Adapts to regime changes

Retrained on recent data, the network adapts to new market behavior without manual model selection.

Captures cross-asset effects

Can learn how BTC vol responds to ETH moves, or how macro events propagate -- multi-input by design.

Limitation

What it means for you

Black box

You cannot inspect why the network produces a given smile shape. Hard to debug when something looks wrong.

Data hungry

Needs large, high-quality historical datasets. Crypto markets may not have enough history for reliable training.

Computationally expensive

Training involves Monte Carlo simulation through a neural network. Not a spreadsheet exercise.

No arbitrage guarantee

Unlike SANOS, the output surface may contain arbitrage unless explicitly constrained during training.

Bleeding edge (2019+)

Active research area. No standardized implementations. Few production deployments outside large quant funds.

Relevance to Crypto

Crypto markets are a natural fit for Neural SDEs because the vol dynamics are poorly understood and change rapidly. There is no consensus on whether BTC vol is better modeled by SABR, Heston, rough vol, or something entirely different. A Neural SDE sidesteps this debate by learning whatever dynamics the data contains -- including Black-Scholes-violating patterns like regime switches. The main obstacle is data: crypto options markets are young and the training set is small compared to equity or rates.

💡

Learned models, learned hedges

Neural SDEs replace hand-picked vol models with learned ones. Deep Hedging replaces theoretical hedge ratios with friction-aware ones. The tradeoff: interpretability, data requirements, and compute cost. For now, research tools -- but they define the frontier.

Equation Explorer

Convert between implied vol, total variance, log-moneyness, and option prices.

Equation Explorer

w = σ2 × Ttotal variance = IV2 × time

Implied Vol (σ)

The implied volatility

Time to Expiry

days

Calendar days to expiration

Total Variance (w)

0.022225

Annualized Variance (σ²)

0.2704

Round-trip IV

52.00%

Total variance is what SVI and other models fit. It scales with time, so a 50% vol for 30 days has less total variance than 50% vol for 90 days.

Test your understanding before moving on.

Q: What does the neural network in a Neural SDE actually learn?

Q: Why does Deep Hedging produce different hedge ratios than classical delta hedging?

Q: A Neural SDE produces a vol surface that contains a calendar spread arbitrage. What went wrong?

💡 Tip: Try answering each question yourself before revealing the answer.

Building mathematical intuition

Learn Neural SDEs from scratchInteractive lesson · no prerequisites

This lesson explains the "learn the equation" idea in plain English, then walks through how the network learns drift and diffusion functions and where deep hedging fits into the picture.

See also:

SABR Model -- Classical stochastic vol model with interpretable parameters
Heston Model -- Mean-reverting stochastic vol with closed-form pricing
SANOS (Non-Parametric Surfaces) -- Non-parametric fitting with guaranteed arbitrage freedom
Path-Dependent Volatility -- Another data-driven approach that uses price path history
Rough Bergomi -- Fractional vol model that Neural SDEs can potentially replace

The network learns the equation

See It in Action​

Neural SDE vs. Classical Models

How It Works​

1. Learn the dynamics, not the shape​

2. Deep Hedging: learn the hedge, not just the price​

3. The vol surface emerges​

Captures dynamics parametric models miss

Strengths and Limitations​

Relevance to Crypto​

Learned models, learned hedges

Equation Explorer​

Equation Explorer

Building mathematical intuition​

See It in Action

How It Works

1. Learn the dynamics, not the shape

2. Deep Hedging: learn the hedge, not just the price

3. The vol surface emerges

Strengths and Limitations

Relevance to Crypto

Equation Explorer

Building mathematical intuition