Learn Rough Bergomi from Scratch

Rough Bergomi from zero

1/5

Section 1

Volatility has rough paths

When researchers measured how realized volatility behaves at high frequency, they found something that breaks every classical model: the autocorrelation of vol increments decays as a power law, not exponentially. Vol paths are far more jagged than anyone assumed.

In Heston, SABR, or any diffusion-based model, the variance process is driven by standard Brownian motion. BM has a Hurst exponent of H = 0.5, meaning its increments are uncorrelated. The resulting paths are continuous but smooth enough to differentiate "most of the time" in an intuitive sense.

Gatheral, Jaisson, and Rosenbaum (2018) measured the realized vol of equity indices and individual stocks. They looked at how the autocorrelation of log-volatility increments decays with lag. The result: it decays as a power law, γ(k) ∼ k^2H−1, with H ≈ 0.1. Not H = 0.5. Not H = 0.3. H is close to zero.

Mental model

Think of drawing a line with a ruler vs. scribbling with a pen while someone bumps your elbow. Classical models use the ruler. Rough vol says the scribble is closer to reality. The pen changes direction constantly at every timescale, not just at the frequency of some mean-reverting OU process.

What does H ≈ 0.1 mean in practice? Vol increments are strongly anti-correlated. If vol ticked up over the last five minutes, it is more likely to tick down over the next five. This constant reversal at every timescale is what makes the path look rough -- jagged and fractal, like a coastline rather than a highway.

This is not a modeling choice. It is an empirical fact observed across equities, indices, FX, and crypto. The universality of H ≈ 0.1 is one of the most striking findings in modern financial econometrics.

Increment Autocorrelation

H0.10

2H−1 = -0.80 — Strongly anti-correlated: rough paths

At H < 0.5, increments are negatively autocorrelated. A move up is likely followed by a move down, making the path jagged. This is the empirical signature of rough volatility.

Drag the slider above. At H = 0.5, the autocorrelation is zero at all lags -- standard BM, no memory. As you lower H toward 0.1, the autocorrelation becomes strongly negative. Increments are anti-correlated. That is roughness.

Section 2

What H controls

H is the Hurst exponent. It is the single number that governs how rough or smooth a stochastic process looks. Everything in rough vol theory flows from H being much less than 0.5.

H = 0.5: Standard Brownian motion. This is what Heston uses. Increments are uncorrelated. Paths are continuous but not differentiable. The "default" roughness that classical finance assumes.

H < 0.5: Rough. Increments are anti-correlated. The lower H is, the rougher the path. At H = 0.1, paths look like they were drawn by an earthquake seismograph. Every upward wiggle is likely followed by a downward wiggle, at every timescale.

H → 0: Extremely rough. In the limit, the path becomes so jagged it is barely continuous. For practical purposes, H ≈ 0.1 is rough enough to match real markets.

H > 0.5: Smooth (persistent). Increments are positively correlated. Paths trend. This regime is not relevant for volatility but appears in some hydrology and network traffic models.

Fractional Brownian motion

W^H(t) = fBM with Hurst parameter H

Cov(W^H(t), W^H(s)) = ½(|t|^2H + |s|^2H − |t−s|^2H)

When H = 0.5, this reduces to min(t, s) -- standard BM covariance. When H ≠ 0.5, the covariance structure changes: increments acquire memory.

Rough vs Smooth Variance Paths

H0.10

H = 0.10 — Very rough: jagged, realistic vol paths

The top panel shows three variance paths side by side at H = 0.1, 0.3, and 0.5. The visual difference is dramatic. At H = 0.5 the path meanders smoothly. At H = 0.1 it looks like static on a TV screen -- constant reversals, jagged peaks.

Use the slider in the bottom panel to sweep H continuously. Watch how the path transforms from smooth to rough as you lower H. This is not a parameter of a particular model -- it is a measurable property of real volatility data.

Section 3

The rough Bergomi model

Bayer, Friz, and Gatheral (2016) took the empirical rough vol finding and built a pricing model around it. The variance process is driven by fractional Brownian motion instead of standard BM. The result is elegant, parsimonious, and non-Markovian.

Rough Bergomi variance

v(t) = ξ₀(t) · exp(η · W^H(t) − ½η² · t^2H)

ξ₀(t): the forward variance curve. Read directly from market prices of variance swaps. This anchors the model to observed term structure.
η (eta): vol-of-vol. Controls how much the variance deviates from the forward curve. Higher η = wider smile.
W^H(t): fractional Brownian motion with Hurst exponent H. This is the rough driver.
−½η²t^2H: convexity correction ensuring E[v(t)] = ξ₀(t). The model is automatically calibrated to the variance term structure.

The spot price follows the usual log-normal diffusion with the instantaneous variance v(t):

Spot dynamics

dS(t) = √v(t) · S(t) · dW(t)

corr(dW(t), dW^H(t)) = ρ

The spot Brownian motion W is correlated with the fractional driver W^H. Negative ρ creates skew, same mechanism as Heston.

Count the free parameters: H (Hurst exponent), η (vol-of-vol), and ρ (spot-vol correlation). That is three parameters total, plus the forward variance curve ξ₀(t) which is read from the market. Compare to Heston's five free parameters. The model is more parsimonious.

The critical difference from Heston: this model is not Markovian. In Heston, the future of variance depends only on the current level of variance. In rough Bergomi, the future depends on the entire history of the path. The fractional BM has long-range dependence baked in. You cannot summarize the state in a single number.

Markov vs Non-Markov: History Matters

History A: variance was rising

History B: variance was falling

Both paths arrive at the same variance level at "NOW". In Heston, the future cones overlap (history forgotten). In rough Bergomi, the rising-history path has a different future distribution than the falling-history path.

Toggle between Markov and rough above. Two variance paths arrive at the same level at time "NOW," but they got there by different routes. In Heston (Markov), their future distributions are identical -- the model has no memory. In rough Bergomi, the path that was rising has a different future cone than the path that was falling. History is baked into the dynamics.

Why non-Markov matters for trading

If you are a vol trader and you see 30-day realized vol at 45%, you want to know: did it get there by spiking from 20% (likely to mean-revert fast), or by slowly grinding from 40% (likely to persist)? Heston cannot distinguish these two scenarios. Rough Bergomi can. The path history contains information about the future.

Section 4

Why rough vol explains short-dated smiles

The killer application of rough vol theory: it predicts that ATM skew scales as T^H−0.5. At H = 0.1, that means skew blows up for short maturities -- exactly what crypto and equity markets show.

The ATM skew is the slope of implied volatility as a function of log-moneyness, evaluated at the money. Every stochastic vol model predicts a specific relationship between this skew and maturity T:

Skew term structure

|skew(T)| ∝ T^{H − 0.5}

H = 0.5 (Heston): skew ∝ T⁰ = constant. Skew does not depend on maturity. Too flat at the short end.
H = 0.1 (rough): skew ∝ T^−0.4. Skew explodes as T → 0. Matches real data.

This is the punchline of the entire rough vol program. Classical models predict a skew term structure that is too flat at the front end. They can match 3-month skew but struggle with 1-week or 1-day skew. Traders have known for years that short-dated smiles are steeper than Heston predicts. Rough vol explains why: the roughness of the underlying variance process directly controls how fast skew grows as maturity shrinks.

ATM Skew Term Structure

H = 0.1: skew ∝ T^-0.4

H = 0.3: skew ∝ T^-0.2

H = 0.5: skew ∝ T^0.0

BTC empirical data

Power-law skew: |skew| ∝ T^H−0.5. At H=0.1, the exponent is −0.4, so short-dated skew blows up.

The chart above shows the three regimes on a log scale. At H = 0.1 (green), the skew curve is steep -- short-dated skew is much larger than long-dated. At H = 0.5 (red, Heston-like), the curve is nearly flat. The yellow dots are empirical BTC data, and they track the H = 0.1 curve closely.

This is not a coincidence. When you measure H from BTC realized vol data, you get H ≈ 0.1. When you look at the skew term structure implied by BTC options, it scales like T^−0.4. The theory and the data agree.

Why Heston gets this wrong: Heston's CIR variance process is driven by standard BM (H = 0.5). It cannot produce power-law skew decay with an exponent below zero. You can make Heston's skew steep by cranking up σ (vol-of-vol), but that violates the Feller condition and creates numerical problems. Rough Bergomi achieves steep short-dated skew naturally, without any parameter contortion.

Section 5

Pricing challenges

Rough Bergomi is theoretically beautiful and empirically grounded. But it is expensive to use. No closed-form prices, no PDE, no fast Fourier trick. Monte Carlo only, and even that is slow because of the non-Markov structure.

No closed-form characteristic function. Heston's killer feature is its semi-analytic pricing via Fourier inversion. Rough Bergomi does not have this. The fractional BM driver breaks the affine structure that makes Heston's characteristic function solvable.

Monte Carlo only. To price a vanilla option under rough Bergomi, you simulate paths of the variance process, compute terminal spot prices, and average payoffs. Standard Monte Carlo convergence: 1/√N. To get a price accurate to 1 basis point, you need a lot of paths.

Simulating fBM is expensive. Standard BM is Markov: to simulate the next step, you only need the current value. fBM is non-Markov: to simulate the next step correctly, you need the entire history of the path. A naive Cholesky decomposition costs O(N²) per path in memory and O(N³) in time, where N is the number of time steps. That is brutal for long paths.

Hybrid schemes. Bayer, Friz, and Gatheral proposed a hybrid scheme that splits the fBM kernel into a "near" part (computed exactly) and a "far" part (approximated with a few basis functions). This reduces the cost to roughly O(N · log N) per path, which makes calibration feasible but still not fast enough for real-time pricing on a trading desk.

No PDE. Markov models like Heston can be priced via PDEs (finite differences). This gives fast, grid-based pricing. Non-Markov models have no finite-dimensional state space, so you cannot write a PDE. The "curse of non-Markovianity" is that the state is infinite-dimensional (the entire path history).

Computational cost comparison

Heston: Fourier inversion — O(1) per option (microseconds)

Rough Bergomi: Monte Carlo — O(N·M) per option (seconds)

N = time steps per path, M = number of paths. A typical calibration requires evaluating hundreds of option prices per optimizer step. This makes rough Bergomi 10,000x slower than Heston for the same task.

Where rough Bergomi fits in practice:

1. Research and calibration studies. Academics and quant researchers use it to validate the rough vol hypothesis and to benchmark other models. If your fast model (SVI, SABR) gives different skew than rough Bergomi predicts, you know something is off.

2. Overnight calibration. Some desks run rough Bergomi calibration overnight as a diagnostic. It tells them whether their fast daytime model is missing skew dynamics.

3. Informing intuition. Even if you never run the model live, understanding rough vol changes how you think about short-dated options. When the 1-day skew looks steeper than your model predicts, rough vol tells you that is normal -- it is the market's rough variance paths showing through.

4. Neural network proxies. Recent work trains neural networks to approximate rough Bergomi prices. The network learns the mapping from parameters to prices offline (using slow Monte Carlo), then evaluates in milliseconds at runtime. This may eventually make rough vol usable in production.

The bigger picture

Rough Bergomi sits at the intersection of mathematical finance and econometrics. It is one of the rare cases where a measurement (H ≈ 0.1) directly dictated a model. Most models are invented first and fit later. Rough vol was discovered in the data first and formalized second. That empirical grounding is why the community takes it seriously, despite the computational cost.

Where to go next:

Heston Model -- the Markov stochastic vol workhorse, with Fourier pricing

SVI Parameterization -- the fast smile fitting standard for crypto vol surfaces

SABR Model -- stochastic vol without mean reversion

Interpolation Methods -- all surface construction methods compared