Bates from zero
1/5Heston + jumps = Bates
Heston explains the long-dated smile: stochastic variance creates smooth skew and term structure. Merton explains the short-dated smile: jumps in the price process create steep wings at short expiries. Bates combines both into one model.
The core problem is simple. Heston moves continuously -- the spot price never teleports. That means Heston alone cannot explain why a 1-week 25-delta put can trade at 80% vol while the 1-year version trades at 55%. The short-dated wing steepness requires something that continuous diffusion cannot deliver: instantaneous gaps.
Merton (1976) solved the gap problem by adding a Poisson jump process to geometric Brownian motion. But Merton has no stochastic variance, so it cannot reproduce term structure dynamics. It prices one expiry well, then falls apart across the curve.
Bates (1996) glued the two together. The result is the workhorse model for exotic desks that need both realistic dynamics and tractable pricing.
Second line: variance follows the same CIR process as Heston. Nothing changes here.
Third line: same correlation structure. ρstill drives the smooth skew.
Think of a car on a bumpy road (Heston: the road surface quality changes stochastically). Now add potholes that appear at random (Merton jumps: the car drops suddenly). You need suspension for the bumps and airbags for the potholes. Bates gives you both.
The key mathematical insight: because the jump component is independent of the variance process, the characteristic function of Bates is just the Heston characteristic function multiplied by the Merton jump factor. That means pricing remains semi-analytic -- Fourier inversion still works. No Monte Carlo needed for vanillas.
What the extra parameters do
Bates inherits Heston's five parameters (κ, θ,σ, ρ, v₀) and adds three jump parameters:λ (jump frequency), μⱼ (mean jump size), and σⱼ (jump volatility). Eight knobs total.
λ (lambda) -- jump intensity. Expected number of jumps per year. λ = 0 recovers pure Heston. λ = 2 means roughly two jumps per year on average. Higher λ makes the wings lift further because the market prices more gap events into the options.
μⱼ (mu-J) -- mean jump size. The average log-return of a jump. Negative μⱼ means jumps are biased downward (crash jumps). This creates asymmetry: the put wing steepens more than the call wing. In crypto,μⱼ is typically between −0.05 and−0.15, reflecting liquidation cascades and flash crashes.
σⱼ (sigma-J) -- jump volatility. The standard deviation of jump sizes. Even if the mean jump is zero, nonzero σⱼ creates symmetric wing lift. This is pure excess kurtosis from random-sized jumps. Largerσⱼ means fatter tails.
Toggle jumps on and off above. When jumps are off, you see pure Heston (dashed blue). Turn them on and the wings lift -- especially the left wing, because μⱼ < 0 biases jumps downward. Crank λ to 3 or 4 and the effect is dramatic. Set μⱼ = 0 and notice the lift becomes symmetric.
The crucial insight: ρ (Heston) and μⱼ(jumps) both create skew, but through completely different mechanisms.ρ creates skew via spot-vol correlation, which builds gradually over time. μⱼ creates skew via directional jumps, which appear instantly. This is why Bates can fit both the short end and the long end simultaneously.
Term structure decomposition
The short-dated smile is mostly jumps. The long-dated smile is mostly stochastic vol. This separation is why Bates exists -- neither component alone fits the full term structure.
The mechanism is variance scaling. Diffusive variance accumulates proportionally to T: over a year, the diffusive component has time to build up. Jump variance also scales with T (λ · T expected jumps), but each individual jump is the same size regardless of horizon.
At T = 7 days, you have had barely any time for diffusive variance to accumulate, but a single jump can still hit you at full size. One −10% crash in a week has the same payoff impact as a−10% crash in a year -- but the crash represents a much larger fraction of the total expected move over 7 days than over 365 days.
At T = 1 year, stochastic vol has had time to explore the full distribution of variance paths. Mean reversion, vol clustering, and spot-vol correlation all play out. The jump component is still there, but it is a smaller fraction of total variance.
Look at the four charts above. At T = 7d, the red region (jump contribution) dominates the wings. At T = 1y, it is a thin sliver. Increase λ and watch the crossover point shift -- more frequent jumps push the jump contribution further out the curve.
This decomposition has direct trading implications. If you think jump risk is mispriced, you trade the short end. If you think variance dynamics are mispriced, you trade the long end. Bates gives you a framework to separate these bets.
Calibrating Bates
Eight parameters is a lot. Different combinations can produce similar smiles, and the optimizer can wander into unstable territory. Practical calibration requires discipline.
The standard approach is a two-stage strategy:
Stage 1: fix what you can observe. v₀ is pinned from the current ATM implied variance. The drift rate r is known. That leaves seven free parameters.
Stage 2: calibrate in groups. First fit κ, θ, σ, ρ to the long-dated smile (where jumps contribute little). Then fitλ, μⱼ, σⱼ to the short-dated residuals. Iterate a few times to refine.
This approach works because the two parameter groups control different parts of the surface. Heston parameters shape the back end; jump parameters shape the front end. Fitting them sequentially reduces the dimensionality of each optimization step.
The overfitting trap. More parameters always improve in-sample fit. But if you let all eight float freely, you risk fitting noise. The telltale sign: parameters that change dramatically day-to-day while producing similar smiles. If λ oscillates between 0.5 and 3.0 across consecutive calibrations, your fit is unstable.
The chart above shows a realistic comparison. Heston (orange, 5 params) fits the ATM region well but systematically misses the deep OTM puts. Bates (green, 8 params) nails the wings because the jump component captures the steep short-dated skew that Heston cannot reach.
Look at the residual chart below the main plot. Heston residuals are large and systematic in the wings -- the model is biased, not just noisy. Bates residuals are smaller and more random. That is the signature of a genuine improvement, not just overfitting.
Rule of thumb: if adding 3 parameters reduces SSE by more than 50%, the extra complexity is earning its keep. If the reduction is only 10-20%, you might be better off sticking with Heston and accepting the wing error.
The crypto workhorse
Bates is the standard model for crypto exotic desks because crypto markets exhibit both stochastic volatility and frequent jumps. Liquidation cascades, depegs, and exchange outages create real gap risk that Heston alone cannot price.
Crypto vol surfaces have distinctive features that Bates handles well:
Persistent vol regimes. BTC can stay at 30% IV for weeks, then snap to 80% on a single liquidation cascade. Lowκ (slow mean reversion) combined with high v₀captures the post-shock environment. This is the Heston component doing its job.
Frequent gap moves. A 10% intraday crash is uncommon in equities but happens multiple times per year in crypto. These are genuine jumps, not just large diffusive moves. They show up as extremely steep short-dated put wings that no amount ofσ (vol-of-vol) can match. The jump component handles this.
Both directions. Unlike equity markets where jumps are almost always down, crypto has significant upside gap risk too (short squeezes, surprise ETF approvals, exchange listings). Settingμⱼ closer to zero (or even slightly positive for some coins) lets the model capture symmetric gap risk.
The variance decomposition above shows how total ATM variance splits between the diffusive and jump components. For typical crypto parameters, jumps can account for 20-40% of total variance. That is not a correction term -- it is a first-order effect.
Beyond Bates: SLV. Bates fits the observed surface better than Heston, but it still cannot fit every strike and expiry exactly. For production exotic pricing, most desks layer a local volatility overlay on top, creating a stochastic-local-vol (SLV) model. Bates provides the dynamics engine; local vol provides the exact calibration. See the SLV reference for details.
When Bates is overkill: if you only need to interpolate a single smile for a single expiry, use SVI. If you need a full surface without dynamics, SSVI is faster and more stable. Bates earns its complexity when you need the underlying dynamics -- for exotic pricing, hedging path-dependent products, or decomposing the smile into economic components.
Black-Scholes: no smile. One vol fits nothing.
Heston: smooth smile dynamics. Handles the long end.
Bates: smooth + jumpy. Handles both ends.
SLV: exact calibration + dynamics. The production standard.
Each step adds complexity and calibration cost. The art is knowing when the extra machinery is worth the overhead for your specific use case.
Where to go next:
Heston from Scratch -- deep dive into the five Heston parameters
SVI Parameterization -- the smile fitting standard for crypto vol surfaces
SSVI -- arbitrage-free full surface parameterization
Interpolation Methods -- all methods compared