Learn Heston from Scratch

Heston from zero

1/5

Section 1

Variance is alive

Black-Scholes treats volatility like a fixed number stamped on the contract. It never changes. The world obviously does not work that way. Heston fixes this by giving variance its own stochastic differential equation.

In Black-Scholes, the spot price follows one SDE with a constant σ. Every option, every strike, every expiry uses the same vol. The model is internally consistent but wrong: the market quotes a different σ for every strike. That is the smile, and BS cannot produce it.

Mental model

Think of spot as a car and variance as the road surface. In BS, the road is perfectly smooth asphalt everywhere. In Heston, the road surface itself changes -- sometimes gravel, sometimes ice, sometimes fresh tarmac. The car reacts to whatever surface it is on. The bumpier the road, the noisier the ride.

Heston says: spot moves like BS but with a variable √v instead of constant σ. And variance follows its own mean-reverting square-root process:

Heston system

dS = √v · S · dW₁

dv = κ(θ − v)dt + σ√v · dW₂

corr(dW₁, dW₂) = ρ

First line: spot diffuses with instantaneous vol √v, not a fixed constant.
Second line: variance has its own drift (pulling toward θ) and its own noise (scaled by σ).
Third line: the two Brownian motions are correlated. This is the engine behind the skew.

That second equation is a CIR (Cox-Ingersoll-Ross) process -- the same process used for interest rates. It has a built-in floor: the √v diffusion term shrinks as v approaches zero, which prevents variance from going negative (under the right conditions).

The result: vol can spike, fade, cluster, and co-move with spot. All of those patterns are visible in real markets. BS cannot reproduce any of them. Heston can.

Section 2

The five parameters

Heston has exactly five free parameters. Each one tells a separate story about market behavior. Learn to read them like a dashboard.

κ (kappa) -- mean reversion speed. How hard variance gets pulled back to its long-run level. High κ means vol spikes are short-lived: the process snaps back fast. Low κ means vol regimes persist. In crypto,κ tends to be low -- vol stays elevated after a shock.

θ (theta) -- long-run variance. The level variance gravitates toward over time. If you take √θ, you get roughly the long-dated ATM vol. For BTC, that is typically somewhere around 50-70% annualized.

σ (sigma) -- vol-of-vol. How erratic the variance process itself is. When σ = 0, there is no smile at all -- you are back to a deterministic vol world. As σ increases, both wings of the smile lift. Think of it as: more randomness in variance = fatter tails = more expensive OTM options.

ρ (rho) -- spot-vol correlation. The direction link between spot moves and vol moves. Negative ρ means spot down, vol up. This is the single most important parameter for skew. We cover it in depth in the next section.

v₀ -- initial variance. Where variance is right now. If v₀ is above θ, short-dated options price current stress while long-dated options lean back toward normal. After a vol spike, v₀ >θ and the term structure inverts.

Heston Parameter Explorer

κ (Mean reversion)2.0

How fast variance reverts to θ

θ (Long-run var)0.040

Equilibrium variance level

σ (Vol-of-vol)0.50

Controls smile curvature

ρ (Spot-vol corr)-0.70

Negative = put skew

v₀ (Initial var)0.040

Current variance level

ATM IV20.0%

90/100 put skew+2.8%

110/100 call skew-1.3%

Feller: 2κθ vs σ²0.160 vs 0.250

Drag the sliders above. Focus on one parameter at a time. The biggest insight: ρ tilts the smile left or right. σ widens it. κ/θ/v₀ set the level and term structure.

Section 3

How correlation creates skew

This is the core math insight of Heston. Negative ρ means that when spot drops, variance tends to rise. That one relationship produces the entire left-skewed smile you see in equity and crypto markets.

Here is the mechanism, step by step:

1. Spot drops (dW₁ is negative).
2. Because ρ < 0, dW₂ tends to be positive.
3. Positive dW₂ pushes variance up.
4. Higher variance means the underlying is now more volatile.
5. OTM puts (low strikes) become more likely to end in the money.
6. The market prices them higher. Left wing of the smile rises.

The reverse also holds: spot up, vol down. Call-side options lose some volatility premium. That is why the right wing is typically flatter than the left.

How Correlation Creates Skew

ρ = –0.7: Left-skewed (typical equity/crypto)

ρ = 0: Symmetric smile

ρ = +0.3: Right-skewed (rare in practice)

Click between the three presets above. The difference is dramatic:

ρ = −0.7: Strong left skew. This is what equity and crypto markets look like. Downside protection is expensive because vol spikes when the market falls.

ρ = 0: Symmetric smile. No directional preference between spot and vol. You get a pure curvature from vol-of-vol, but no tilt.

ρ = +0.3: Right skew. Upside options are relatively expensive. This is rare in practice but can occur in commodity markets where supply shocks drive both price and uncertainty up together.

The Greek connection

ρ maps directly to vanna exposure. Vanna is the sensitivity of delta to changes in vol. Whenρ is strongly negative, OTM puts have large positive vanna: their delta becomes more negative as vol rises. This is why short put positions get more dangerous in a selloff -- they are short vanna.

Section 4

The characteristic function

Most stochastic vol models require Monte Carlo simulation for pricing. Heston has a trick: you can price options via Fourier inversion of a known characteristic function. No simulation needed.

The standard Black-Scholes call price formula has the form C = S·N(d₁) − K·e^−rTN(d₂). Heston has an analogous structure:

Heston call price

C = S·P₁ − K·e^−rT·P₂

Same structure as BS, but P₁ and P₂ are computed via Fourier inversion instead of the normal CDF.

The key object is the characteristic function φ(u). It encodes everything about the probability distribution of the log-spot price at expiry. Think of it as the distribution's fingerprint in frequency space.

Fourier inversion

Pⱼ = ½ + (1/π) ∫₀ⁿ Re[e^−iu·ln(K) · φⱼ(u) / (iu)] du

One-dimensional integral. Converges fast. The characteristic functions φ₁(u) and φ₂(u) have closed-form expressions in terms of the five Heston parameters.

Why does this work? Three steps:

1. Moment generating function. Because the Heston SDE is affine (linear in the state variables), its moment generating function can be solved in closed form. This is the mathematical accident that makes Heston special.

2. Characteristic function = MGF on the imaginary axis. The characteristic function is φ(u) = E[e^iu·X] where X = ln(S_T). Once you have the MGF, you have φ.

3. Invert for density, integrate for price. Standard Fourier inversion recovers the risk-neutral density from φ. Integrating that density against the payoff gives you the option price. The integral is one-dimensional and converges in microseconds.

The result: a full smile computed in milliseconds, not minutes. That makes calibration feasible. You can fit five parameters to an observed smile by evaluating this integral thousands of times inside an optimizer.

Why this matters

Before Heston (1993), stochastic vol models existed but were impractical -- you had to simulate paths to price a single option. Heston's characteristic function made stochastic vol usable on a trading desk. Every descendant model (Bates, double Heston, rough Bergomi) tries to preserve or approximate this Fourier pricing structure.

Section 5

When Heston breaks down

Heston is elegant, but it has real limits. The variance process can touch zero, the smile shape is too rigid for crypto, and the five-parameter fitting problem is a minefield of local optima.

The Feller condition. For variance to stay strictly positive, you need:

Feller condition

2κθ > σ²

Left side: mean reversion strength. Right side: variance noise squared. If the noise overwhelms the pull-back, variance can hit zero.

In practice, fitted Heston parameters frequently violate the Feller condition. The market wants more vol-of-vol (σ) than the Feller condition allows. When violated, the variance process can touch zero and must be "reflected" or "absorbed" -- which creates numerical headaches and makes the model less trustworthy in the wings.

Feller Condition Checker

κ2.0

θ0.040

σ0.50

✗

Feller violated

2κθ = 0.160 ≤ σ² = 0.250

Variance can touch zero. Paths may get absorbed, causing numerical issues.

0 of 0 paths hit zero

Adjust σ upward and watch the Feller condition break. Red paths hit zero. In a real pricing engine, those zero-touches require special handling that slows things down and introduces subtle errors.

Crypto smiles are too steep. Short-dated crypto options often have extremely steep skews and wide wings. Heston's CIR variance process is too smooth to capture this. The model's wing behavior approaches a constant slope, but real crypto wings are steeper than that. This is why crypto desks use SVI or SSVI for surface fitting and treat Heston as a conceptual tool, not a production fitting engine.

Five-parameter fitting is unstable. Different parameter combinations can produce nearly identical smiles. The optimizer has multiple local minima. Day-to-day calibrations can jump between wildly different parameter sets while producing similar prices. This makes hedging unreliable because the Greeks depend on which parameter set you landed in.

Extensions that fix these problems:

Bates = Heston + jumps. Adding a jump component to the spot process gives you fatter short-dated wings without needing unreasonable σ values. The jump intensity and size add extra parameters, but the characteristic function still has a semi-closed form.

Stochastic local vol (SLV). Combines Heston-style stochastic variance with a local vol overlay. You get exact calibration to the observed surface (from local vol) plus realistic dynamics (from the stochastic component). This is what many production desks actually run.

Rough Bergomi. Replaces the smooth CIR variance process with fractional Brownian motion (Hurst parameter H near 0.1). Variance paths become rough and jagged, matching observed vol behavior much better. The cost: no closed-form characteristic function.

Where to go next:

SVI Parameterization -- the smile fitting standard for crypto vol surfaces

SABR Model -- stochastic vol without mean reversion, simpler fitting

Rough Bergomi -- fractional stochastic vol, rough paths

Interpolation Methods -- all methods compared