Deep Hedging under Rough Volatility

We investigate the performance of the Deep Hedging framework under training paths beyond the (finite dimensional) Markovian setup. In particular we analyse the hedging performance of the original architecture under rough volatility models with view to existing theoretical results for those. Furthermore, we suggest parsimonious but suitable network architectures capable of capturing the non-Markoviantity of time-series. Secondly, we analyse the hedging behaviour in these models in terms of P\&L distributions and draw comparisons to jump diffusion models if the the rebalancing frequency is realistically small.


Introduction
Deep learning has undoubtedly had a major impact on financial modelling in the past years and has pushed the boundaries further of the challenges that can be tackled: Not only can existing problems be solved faster and more efficiently [1,2,3,4,5,6,7,8], but deep learning also allows us to derive (approximative) solutions to optimisations problems [9], where classical solutions had so far been limited in scope and generality. Additionally these approaches are fundamentally data driven, which makes them particularly attractive from business perspectives.
It comes as no surprise that the more similar (or "representative") the data presented to the network in the training phase is to the (unseen) test data, which the network is later applied to, the better is the performance of the hedging network on real data (in terms of P&L). It is also unsurprising that, as markets shift sufficiently far away from a presented regime into new, previously unseen territories, the hedging networks may have to be retrained to adapt to the new environment.
In the current paper we go a step further than just presenting an ad hoc well chosen market simulator (see [10,11,12,13,14,15,16,17]): we investigate a situation where the relevant test data is structurally so different from the original modelling setup that it calls for an adjustment of the model architecture itself: in a well-controlled synthetic data environment we study the behaviour of the hedging engine as relevant properties of the data change.
More specifically, we use synthetic data generated from a rough volatility model with varying levels of the Hurst parameter. In its initial setup we set the Hurst parameter to H = 1/2, which reflects a classical (finite dimensional) Markovian case, which is well-aligned with the majority of the most popular classical financial market models, such as, e.g., the Heston model, which the initial version of the deep hedging results were demonstrated on. We then gradually alter the level of the Hurst parameter to (rough) levels around H ≈ 0.1, which more realistically reflects market reality as observed in [18,19,20,21,22] thereby introducing a non-Markovian memory into the volatility process.
Since rough volatility models are known to reflect the reality of financial markets (as well as the stylised statistical facts) better than classical, finite-dimensional Markovian models do, our findings also give an indication how a naive application of model architectures to real data could lead to substantial errors. With this our study allows us to make a number of interesting observations for deep hedging and the data that it is applied to: apart from drawing parallels between discretely observed rough volatility models and jump processes, our findings highlight the need to rethink (or carefully design) risk management frameworks of deep learning models as significant structural shifts in the data occur.
The paper is organised as follows: Section 2 recalls the setup of the original deep hedging framework used in [9]. Section 3 gives a brief reminder on hedging under rough volatility models and compares the performance of (feed-forward) hedging network on a rough Bergomi model compared to a theoretically derived model hedge. In Sections 3.3 and 3.4 we draw conclusions with respect to the model architecture and in Section 3.5 we propose a new architecture that is better suited to the data. Section 4 lays out the hedging under the new architecture and draws conclusions to existing literature which outlines some parallels between (continuous) rough volatility models and jump processes in this setting, while Section 5 summarizes our conclusions.

Setup and Notation
We adopt the setting in [9] and consider a discrete finite-time financial market with time horizon [0, T ] for some T ∈ (0, ∞) and a finite number of trading dates 0 = t 0 < t 1 < · · · < t n = T , n ∈ N. We work on a discrete probability space (Ω, F, P), with Ω = {ω 1 , . . . , ω N } and a probability measure P for which P [{ω i }] > 0 for all i ∈ {1, . . . , N } and N ∈ N. Additionally, we fix the notation X := {X : Ω → R} for the set of all R-valued random variables on Ω. The filtration F = (F k ) k=0,...,n is generated by the R r -valued information process (I k ) k=0,...,n for some n, r ∈ N. For any k ∈ {0, . . . , n}, the variable I k denotes all available new market information at time t k and F k represents all available market information up to time t k . The market contains d ∈ N financial instruments which can be used for hedging, with mid-prices given by an R d -valued F-adapted stochastic process S = (S k ) k=0,...,n . In order to hedge a claim Z : Ω → R we may trade in S according to R d -valued F-adapted processes (strategies), which we denote by δ := (δ k ) k=1,...,n , where δ k = (δ 1 k , . . . , δ d k ). Here, δ i k denotes the agent's holdings of the i-th asset at time t k . We denote the initial cash injected at time t 0 by p 0 > 0. Furthermore, in order to allow for proportional trading costs, for every time t k and change in position s ∈ R d we consider costs c k : s → c ∈ [0, ∞), where c k is F kadapted, upper-semi continuous and for which c k (0) = 0 for all k ∈ {0, . . . , n}. The total costs up to time T , when trading according to a trading strategy δ are denoted by C T (δ) := n k=0 c k s k−1 (δ k − δ k−1 ). Finally, we denote by H a set of all trading strategies.
We consider optimality of hedging under convex risk measures as in [9,23] and [24]. For a reminder on convex risk measures see e.g. [25]. Now let ρ : X → R be a cash invariant convex risk measure on the set X . As in [9], we consider for random variables X ∈ X the original optimization problem − π(X) := inf δ∈H ρ(−X + (δ · S) T − C T (δ)). (2.1) An optimal hedging strategy for X is a minimizer δ ∈ H of (2.2), where the premium is π(X).
In case of no trading costs there is an alternative view point, which will be taken in this paper: consider an equivalent pricing measure Q of our financial market, then we can also minimize the variance (with respect to the pricing measure Q) where p 0 denotes the expectation of X with respect to Q, i.e. the risk neutral price. In other words: the price of the quadratic hedging loss (payoff) should be minimal.
In the rest of this paper, the above optimisation (2.2) problem and-corresponding optimisers-are considered in terms of their numerical approximation in the framework of hedging in a neural network setting as formulated in [9]. In the remainder of this section we recall the notation and definitions to formulate this approximation property and the conditions that ensure its validity.
where Θ M = n−1 k=0 Θ M,r(k+1)+d,d denote the network parameters from Definition 2.2. With (2.3), (2.4) and Remark 2.1, the infinite-dimensional problem of finding an optimal hedging strategy is reduced to a finite-dimensional problem of finding the optimal NN parameters for the problem (2.4).
Remark 2.1. Note that in the above, we do not assume that S is an (F, P)-Markov process and that the contingent claim is of the form Z := g(S T ) for a payoff function g : R d → R. This would allow us to write the optimal strategy δ k = f k (I k , δ k−1 ) for some f k : The next proposition recalls the central approximation property which states that the optimal trading strategy (2.1) can be approximated by a semi-recurrent neural network of the form Figure 2.1 in the sense that the functional π M (X) converges to π(X), as M becomes large. where π(X) denotes the optimal solution of the original optimisaton problem (2.1).

Remark 2.2.
Of course there is a completely analogous formulation of this proposition for the optimal trading strategy (2.2).
Input Layer ∈ ℝ⁴ Hidden Layer ∈ ℝ¹² Hidden Layer ∈ ℝ¹² Output Layer ∈ ℝ² Input Layer ∈ ℝ⁴ Hidden Layer ∈ ℝ¹² Hidden Layer ∈ ℝ¹² Output Layer ∈ ℝ² In [9] this approximation property is demonstrated for Black-Scholes and Heston models both in their original form and in variants including market frictions such as transaction costs. These results demonstrated how deep hedging which allows us to take a leap beyond classical results in scenarios where the Markovian structure is preserved.
A natural question to ask is, how the approximation property of the neural network is affected if the assumption of Markovian structure of the underlying process is no longer satisfied. Rough Volatility models [18,19,20,26] represent such a class of non-Markovian models. It is also well-established in a series of recent articles including the aforementioned works, that rough volatility dynamics are superior to standard standard Markovian models (such as Black-Scholes and Heston) in terms of reflecting market reality and also that rough volatility models are superior to a number of in terms of allowing close fits to market data.
By taking hedging behaviour under rough volatility models under the loop we gain insight into non-Markovian aspects of markets in a controlled numerical setting: Varying the Hurst parameter H ∈ (0, 1) of the process (see [20]), which governs the deviation from the Markovian setting in a general fractional (or rough) volatility framework, enables us to control for the influence of the Markovianity assumption on the hedging performace of the deep neural network. Therefore, in this work we investigate the effect of the loss of Markovianity property of the underlying stochastic process, by considering market dynamics that are governed in a rough volatility setting. With this in mind, by applying the original feedforward network architecture to a more realistic model class (represented by rough volatility models) we in particular demonstrate how the the choice of the network architecture may affect the performance of Deep Hedging framework could potentially break down on real life data. We also note in passing that the approach we take can be applied as a simple routine sanity check for model governance of deep learning models on real data: • Take a well-understood model class that generalises the modelling to more realistic market scenarios, but where the generalisation no longer satisfy assumptions made in the original architecture.
• Test the robustness of the method if the assumption is violated by controlling for the error as the deviation from the assumption increases.
• Modify the network architecture accordingly if necessary.

Hedging under Rough Volatility
Let us now consider the problem of hedging under rough volatility models in general. For this we consider now a continuous filtration 1 {F t } 0≤t≤T . We know that for a Markovian process of the form where b and σ satisfy suitable conditions, the price of a contingent claimZ t := E[g(X T )|F t ] can be written asZ t = u(t,X t ), where u solves a parabolic PDE by Feynman-Kac formula [27]. However, it was shown in [26] that Rough volatility models are not finite dimensional Markovian and we therefore have to consider a more general process X and assume it to be a solution to the d-dimensional Volterra SDE: where W is a m-dimensional standard Brownian motion, b ∈ R d and σ ∈ R m×d . Both are adapted in a sense that for ϕ = b, σ it holds ϕ(t; r, X . ) = ϕ(t; r, X r∧. ).
In this general non-Markovian framework, the contingent claim in the form Z t := E[g(X T )|F t ] will depend on the entire history of the process X := (X t ) t≥0 up to time t and not just on the value of the process at that time i.e.
where u this time solves a Path dependent PDE (PPDE). The setting where X is a semi-martingale has already been explored in e.g. [28,29]. Be that as it may, we know that fBm is not a semi-martingale in general and as a consequence the volatility process is not a semi-martingale. Viens and Zhang [30] are able to cast the problem back in to the semi-martingale framework by rewriting X t as a orthogonal decomposition to an auxiliary process Θ t and a process I t , which is independent of the filtration for 0 ≤ t ≤ s. By exploiting the semi-martingale property of Θ, they go on to show that the contingent claim can be expressed as a solution of a PPDE where ⊗ t denotes concatenation at time t. Moreover, they develop an Itô-type formula for a general non-Markovian process X t from (3.1), which we present in the Appendix.

The rough Bergomi model (rBergomi)
As an example we consider the rBergomi with a constant initial forward variance curve ξ 0 (t) = V 0 : The model fits into the affine structure of our Volterra SDE in (3.1), after a simple log-transformation of the volatility process. In this case we take our auxiliary process to be It is easy to check that Θ t s is a true martingale for fixed s. The option price dynamics are obtained by using the Functional Itô formula in (A.3). From this, the perfect hedge in terms of a forward varianceΘ t T with maturity T and a stock S t follows: The path-wise derivative in (3.7) is the Gateaux derivative along the direction a t . For more details and discretization of the Gateaux derivative see Appendix A.1.

Performance of the deep hedging (with the original feedforward architecture) scheme compared to the model hedge under rBergomi
We choose to hedge a plain vanilla call option Z T := max(S T − K, 0) with K = 100 and a monthly maturity T = 30/365. The hedging portfolio consists of a stock S with S 0 = 100 and a forward variance with maturity T Fwd = 45/365 and is rebalanced daily. For the rBergomi model forward variance is equal tô which is well defined for t ∈ [0, T Fwd ). Therefore, choosing the maturity of the forward variance to be longer than the option maturity allows us to avoid the singularity as t → T . In practice this would correspond to hedging with a forward variance with a slightly longer maturity than that of the option.
For the simulation of the forward variance we used the Euler-Mayurama method, whereas paths of the volatility process were simulated with the "turbo-charged" version of the hybrid scheme proposed in [31,32]. The parameters were chosen such that they describe a typical market scenario with a flat forward variance: ξ 0 = 0.235 × 0.235, ν = 1.9 and ρ = −0.7. We were particularly interested in the dependence of the hedging loss on the Hurst parameter. Finally quadratic loss function was chosen and the minimizing objective was therefore where price p 0 was obtained with a Monte-Carlo simulation (e.g. for H = 0.10, p 0 = 2.39).  Next we implement the perfect hedge from (3.7) the details of the discretization of the Gateaux derivative are presented in Appendix A.3. For evaluation of the option price, we once again use Monte-Carlo, this time with generating parameters. In practice we would calibrate the parameters to the market data. Perfect hedge was implemented on the sample of 10 3 different paths for the same parameters as in the deep hedging case. The results of both hedges under quadratic loss for different Hurst parameters are shown in Table 4. We also take a closer look of the P&L distributions of the deep hedge as well as the model hedge for H = 0.10 in Figure 3.1. Curiously enough, the distributions are very similar to each other. The deep hedge seems to have slightly thinner tails, which is interesting, considering the semi-recurrent architecture makes a strong assumption of Markovianity of the underlying process.
Indicators that the assumption of finite dimensional Markovianity is violated might be the heavy left tail of the P&L distribution as well as relatively high hedging losses. This prompted us to question the semi-recurrent architecture and devise a way to relax the Markov assumption on the underlying. Note that the heavy tails of these distribution may also imply a link to jump diffusion models. We expand on this in Section 4.3.

Implications on the network architecture
As discussed before, in [9] authors heavily rely on Remark 2.1, where they use the Markov property of the underlying process in order to write the trading strategy at time t k as a function of the information process at t k and trading strategy in the previous time step k − 1. Of course, in the case of rough volatility models one would have to include the entire history of the information process up to t k in order to get the hedge at that time. However, this would result in numerically infeasible scheme. To illustrate this, take for example a single vanilla call option with maturity T = 30/365, where we hedge daily under say the rough Bergomi model. In the 30-th time step the number of input nodes of the NN cell F θ 30 would be 30 · 2 + 2 = 62 or if we hedged twice a day 30 · 2 · 2 + 2 = 122. Obviously this scheme quickly becomes very computationally expensive even for a single option with a short maturity.
The fBm in (3.5b) can be written as a linear functional of a Markov process, albeit an infinite dimensional one. Therefore, if the original Markovian-based architecture can be applied to this setting, we would expect to recover the Hurst parameter also from a Markovian-based sampling procedure, justifying the continued use of the original feed forward architecture. This however is not the case: It is known that fBm in (3.5b) can be rewritten as an infinite dimensional Markov process in the following way. Take the Riemann-Liouville representation of fBm: where W is a standard Brownian motion. Using the fact that for α ∈ (0, 1) and fixed x ∈ [0, ∞): we obtain by the Fubini Theorem is an Ornstein-Uhlenbeck process with mean reversion zero and mean reversion speed x i.e. Gaussian semi-martingale Markov process solution with the dynamics of Therefore, we have shown that B H is a linear functional of the infinite dimensional Markov process. Being able to simulate from Y x t would mean that we can still use the architecture in Figure 2.1, even for a rough processes. Numerical simulation scheme for such a process is presented in [33]. Regrettably the estimated Hurst parameter 2 from the generated time series stayed around H ≈ 0.5, for any chosen input Hurst parameter to the simulation scheme. For a fixed time-step ∆t the scheme does not produce desired roughness, even if we used number of OU-terms well beyond what authors propose. We believe this is because scheme is only valid in the limit i.e. when the number of terms goes to infnity and ∆t → 0. Failure to recover the Hurst parameter together with the fact that the architecture does not allow for any path dependent contingent claims, encouraged us to change the Neural Network architecture itself.

Proposed fully recurrent architecture
By the above insights we hence modify the original architecture. In this section we suggest an alternative architecture and show that it is well-suited to the problem. When constructing a new architecture, we would like to change the semi-recurrent structure as little as possible for our purpose, since it seems to perform very well in the Markovian cases. However, in order to account for non-Markovianity we propose a completely recurrent structure. 3 To that end we now introduce a hidden stateδ k = (δ S k−1 ,δ V k−1 ) withδ 0 = 0, which is passed to the cell at time t k along the information process I k . So instead of adding layers to each of the state transitions separately as Input Layer ∈ ℝ⁴ Hidden Layer ∈ ℝ¹² Hidden Layer ∈ ℝ¹² Output Layer ∈ ℝ² Input Layer ∈ ℝ⁴ Hidden Layer ∈ ℝ¹² Hidden Layer ∈ ℝ¹² Output Layer ∈ ℝ² in [36], we simply concatenate the input vector I k with the hidden state vector and feed it into a the neural network cell F θ k (·): For the visual representation see Figure 3.2. The output is still a trading strategy δ k = (δ S k , δ V k ) and it is evaluated on the same objective function as before in case of quadratic hedging losses (without transaction costs): whereas the hidden stateδ k is passed forward to the next cell F θ k+1 . These states can take any value and are not restricted to having any meaningful financial representation as trading strategies do. We illustrate the fact that the fRNN architecture is truly recurrent by showing how hidden states are able to encode the relevant history of the information process. Let's say for example that the information process I k = (S 1 k , S 2 k ) is simply the price of both hedging instruments. The strategies at time t k now do not depend on the asset holdings δ x k−1 , but onδ x k−1 for x ∈ {S, V }: . For some F k−1 -measurable function g k−1 , it holds for the hidden states themselves thatδ Recursively the hidden states are implicitly dependent on the entire historỹ where g x N N is again F k−1 -measurable. Structuring the network this way, we are hoping that the hidden states at time t k will be able to encode the history of the information process I 0 , . . . , I k . More precisely, what we expect is that the network will learn itself the function g x N N : R 2k → R for x ∈ {S, V } and with that the path dependency inherit to the liability we are trying to hedge.
Remark 3.1. We remark that in order to account for the history of the information process one could also write the trading strategy as −n is the history of the information process with a window length of n ∈ {1, . . . , k − 1}. However in this case, we would have to optimize the window length and would inevitably face an accuracy and computational efficiency trade-off. We would rather outsource this task to the neural network.
Remark 3.2. While we do think LSTM architecture [37] would be more appropriate to capture the non-Markovian aspect of our process, we find that our architecture is adequate in that regard as well. Our architecture has the advantage of being tractable (we can still appeal to the Proposition 2.1), all while being much simpler and easier to train.

Deep hedge under Rough Bergomi
Since the fRNN should perform just as well in Markovian case as the original one does, we first convinced ourselves that our architecture produces comparable results in the classical case. Quadratic losses as well as the training time for the Heston model were very similar for both 4 . We were now ready to test it on the rough Bergomi model. We hedge the ATM call from Section 3.3, the parameters were again ξ 0 = 0.235 × 0.235, ν = 1.9 and ρ = −0.7 and we investigate the dependence of the hedging loss on the Hurst parameter. The results are shown in Table 2. Again, the loss seems to exponentially decrease with increasing Hurst parameter and reaches quadratic losses comparable to classic stochastic volatility models at H 0.5.  Comparing these results, with both the model hedge and the deep hedge from Section 3.3 (see Table 3), we notice the fRNN does indeed perform notably better. By increasing number of epochs in the training phase from 75 to 200 the loss in the case of the deep hedge with original architecture does not improve, while the improvement with the proposed architecture is clearly visible. This indicates that while the semi-recurrent NN saturates at a given error, the new architecture keeps converging and improving. Since the training at 200 epochs was computationally costly (in terms of both memory and time) and since we have reached the model hedge's numbers at the higher end of H range we did not keep increasing the number of epochs. But we expect that to keep improving as the number of epochs increases, which definitely indicates the second approaches suitability.  Looking at the Figure 4.1 it is particularly interesting that the P&L distribution becomes increasingly left tailed with lower Hurst parameters. Even under the new architecture the distribution for H = 0.10 is left-skewed with an extremely heavy left tail, where relative losses reached cca. −1000% in one of 10 5 sample paths. What is even more compelling is that the sizeable losses occurred, when the discretized stock process jumped by several thousand basis points during the hedging period. Example of such a path is shown in Figure 4.2. Although jumps are not featured in the rough Bergomi model (the price process is a continuous martingale [38]) the model clearly exhibits jump-like behaviour when discretized.
Naturally, for H = 0.10, where this effect was the most noticeable, we tried increasing the training, test and validation set sizes, as well as number of epochs to 200. Doing this we managed to decrease the realized loss to 0.628. The performance was notably better compared to 0.834 on smaller set sizes, but still far from the loss  of 0.162 we obtained under the Heston model. We investigated settings more epochs, bigger training sizes, different architectures, however the realized test loss did not improve.
As it can be seen in Figure 4.3 model hedge loss distribution exhibits very similar behaviour as the deep hedge distribution. Higher losses of the model hedge can be explained by the slightly fatter tail in comparison to the fully recurrent hedge. We remark this behaviour is somewhat understandable, since re-hedging is done daily and the hedging frequency is far from being a valid approximation for a continuous hedge. In the next section we thus implement hedges at different frequencies to see, whether the Hölder regularity of the underlying process is problematic only for the deep hedging procedure or is the heavy left-tailed P&L distribution a general phenomena, when hedging under a discretized rough model.

Rehedges
We implement deep hedges on rBergomi paths with the Hurst parameter H = 0.10, where we re-hedged from every two days upto four times a day. Again, one can see the distribution became slightly less leptocurtic, with more frequent rebalancing. The quadratic losses also decreased with higher frequency. Yet, this seems to happen at a slower rate than expected. This would essentially mean, that as soon as transaction costs are present, small gains from more frequent rebalancing would be completely outweighed by higher transaction fees. As the matter of fact, for the four-time daily rehedge the loss slightly increased, which indicates the model once again saturates, this time with respect to the hedging frequency. This is quite surprising considering higher hedging frequency usually translates to better performance in a continuous models. This is because the approximation is getting closer and closer to the continuous setting.  Behaviour of distributions as well as hedging losses is in fact quite reminiscent of the behaviour of jump diffusion models analysed by A. Sepp [39], which we recall in the following section.

Relation to the literature
It is rather interesting that A. Sepp [39] observes a similar behaviour, when delta hedging under jump diffusion models. Similarly to our observations above, he finds (in presence of jumps) that after a certain point the volatility of the P&L cannot be reduced by increasing the hedging frequency. More precisely, he shows that for jump diffusion models, there is a lower bound on the volatility of the P&L in relation to the hedging frequency. Not only that, the P&L distributions in Figure 4.5 for delta hedges under jump diffusion models are generally fairly similar to ours. This gives us the idea to treat the discretsed rough models as jump models. In this case the market is incomplete and it is not possible to perfectly hedge a contingent claim with a portfolio containing a finite number of instruments [40]. In practice traders try to come as close as possible to the perfect hedge by trading a number of different options.
Unfortunately, when trying to implement the hedge approximation, we are quickly faced with the absence of analytical pricing formulas and limitations of the slow Monte-Carlo scheme. In order for us to train the deep hedge, we would have to calculate option prices on every time step of each sample path. In a typical application we would need around 10 options with different strikes and at least 10 5 sample paths.

Conclusion
In this work, we presented and compared different methods for hedging under rough volatility models. More specifically, we analysed and implemented the perfect hedge for the rBergomi model from [30] and used the deep hedging scheme from [9], which had to be adapted to a non-Markovian framework.
We were particularly interested in the dependence of the P&L on the Hurst parameter. We conclude the deep hedge with the proposed architecture performs better than the discretized perfect hedge for all H. We also find that the hedging P&L distributions for low H are highly left-skewed and have a lot of mass in the left tail under the model hedge as well as the deep hedge.
To mitigate the heavy losses in cases when H is close to zero, we explored increasing the hedging frequency upto four times a day. The loss did improve and the P&L distribution became less leptocurtic, however only slightly.
Intriguingly, slow response to increased hedging frequency and left-skewed P&L distribution are characteristic for delta hedges under jump diffusion models [39]. We therefore observe that in terms of hedging there is a relation between jump diffusion models and rough models. In accordance with the literature we find that the price process, despite being a continuous martingale, exhibits jump-like behaviour [32]. We believe this is an excellent illustration of rough volatility models dynamics. Explosive almost jump-like pattern in the stock price might be the reason why they can fit the short end of implied volatility so well.
In our view, it is crucial to take into account the jump aspect, when looking for an optimal hedge in discretized rough volatility models. Our suggestion for future research is adapting the objective function in deep hedge scheme for jump risk optimization. First step would be optimization of the Expected shortfall risk measure. Next, more appropriate jump risk measures for discretized rough models can be developed. These risk measures cannot be completely analogous to the risk measures in [39], since rough models themselves do not feature jumps.

A.2 Functional Itô formula
We have to differentiate two cases. The regular case where H ∈ ( 1 2 , 1) and the singular case where the coefficients b, σ explode, because the power-kernel in Riemann-Liouville fractional Brownian motion whenever the Hurst exponent H lies in (0, 1 2 ). In the singular case the coefficients b, σ / ∈ C t and thus they cannot serve as the test function in the right side of (A.2), since Gateaux derivative would not make sense any more. In order to develop an Itô formula for the singular case, definitions need to be slightly amended. Nonetheless, Viens et al. show that both cases yield similar Functional Itô formula.

A.3 Discretization of the Gateaux Derivative
It can be easily shown thatΘ t s = f (Θ t s ) for some f : R → R. Therefore, we have direct relation between the auxiliary process Θ and the forward varianceΘ, which allows us to write the option price as the function of the entire forward variance curveΘ t [t,T ] at time t ∈ [0, T ], namely u(t, S t , Θ t [t,T ] ) =ũ(t, S t ,Θ t [t,T ] ). This is important, when performing Monte-Carlo, since in the rough Bergomi model, the forward variance curve is directly modelled in the variance process with ξ t (·) =Θ t · . Let us suppose that we are able to trade at times 0 = t 0 < t 1 < · · · < t n = T . In order to get the hedging weights at trading times t i , we have to discretize the derivatives. The Gateaux derivative with respect to the stock simplifies to the usual derivative and the discretization is straightforward: for small ε > 0. ∂ ω u(t, ω), η = lim ε→0 u(t, ω + εη1 [t,T ] ) − u(t, ω) ε for any η ∈ Ω t .
The option pricesũ can now be evaluated using Monte-Carlo at each time step to get the hedging weights. Note that the discretization of the Gateaux derivative is purely heuristic and that a rigorous proof of the convergence to the true derivative is out of scope of this work. For more details we refer to [41].