A real-time GP based MPC for quadcopters with unknown disturbances

Gaussian Process (GP) regressions have proven to be a valuable tool to predict disturbances and model mismatches and incorporate this information into a Model Predictive Control (MPC) prediction. Unfortunately, the computational complexity of inference and learning on classical GPs scales cubically, which is intractable for real-time applications. Thus GPs are commonly trained offline, which is not suited for learning disturbances as their dynamics may vary with time. Recently, state-space formulation of GPs has been introduced, allowing inference and learning with linear computational complexity. This paper presents a framework that enables online learning of disturbance dynamics on quadcopters, which can be executed within milliseconds using a state-space formulation of GPs. The obtained disturbance predictions are combined with MPC leading to a significant performance increase in simulations with jMAVSim. The computational burden is evaluated on a Raspberry Pi 4 B to prove the real-time applicability.


I. INTRODUCTION
In recent years, unmanned aerial vehicles (UAVs) have increased in popularity due to their various possible applications, low cost, and variability. A common challenge in many UAV applications includes limitations in flight time due to a constrained battery capacity. A tethering system is a possible means to overcome this issue. However, such a tether creates an additional load and makes the UAV more susceptible to disturbances due to wind.
The commonly used quadcopter setup for a UAV has 6 degrees of freedom (DoF). With four rotors, it is, therefore, an underactuated system, making it a good test platform for modern control approaches. A quadcopter is often controlled by a cascaded PID; the use of MPC for quadcopters is also well established (cf. [1]). However, most of these approaches cannot cope efficiently with exogenous disturbances due to the wind or a tether. There are different approaches to include disturbances in the control loop, such as disturbance rejection and backstepping (cf. [2]). This work proposes a more general approach by modeling disturbances as latent forces and moments acting on the quadcopter, which can be predicted using advanced machine learning techniques.
In recent years, Gaussian process (GP) regression models, which are nonparametric kernel-based probabilistic models [3], have demonstrated significant potential for learning control-oriented models, [4], with the ability to incorporate prior knowledge about the data through the choice of a kernel. As a Bayesian framework, the strength of a GP regression is its ability to estimate the prediction uncertainty, which can be used for robustifying the control systems.
In the context of MPC, it has been demonstrated in, e.g., [5], [6], [7], that GPs can approximate disturbances, nonlinearities, and model mismatches, where their information provided by the GPs can be exploited to enhance the MPC performance. Moreover, the GPs probabilistic descriptions have been leveraged to propagate uncertainties of the estimates over the MPC prediction horizon, yielding a low conservative mechanism for robustification in terms of chance constraints, which casts the problem in a stochastic MPC framework [5].
This paper's contributions can be summarized as follows: We prove the feasibility and instruct the implementation of a real-time capable GP-based MPC for a quadcopter under disturbances using simulation. It is therefore assumed that the disturbances underlie some unknown dynamics. The idea is that if one uses GPs to mimic the underlying disturbance dynamics, then present and anticipated future disturbances can be inferred based on recorded past data. The predictions can then be used by MPC to counteract these unknown disturbances. Furthermore, a cautious control is achieved by formulating a chance-constrained MPC problem using the uncertainties provided by the GP model, as proposed in [5].
II. BACKGROUND This section covers the required basics needed for this work. These include Model Predictive Control (cf. [1]), Gaussian processes (cf. [3]) as well as the LTI representation of GPs with stationary kernels (cf. [8]).

A. Model Predictive Control
Assume a discrete-time nonlinear system given by where x k ∈ R nx is the state and u k ∈ R nu the input of the system at time step k. Model Predictive Control builds upon a modelf (x k , u k ) of the system dynamics (1). Based on this model, the trajectory of the systems' state can be predicted in terms of the input and initial state. Thus, the goal is to find x t+k|t ∈ X , ∀k = 1, · · · , N (2) u t+k|t ∈ U, ∀k = 0, 1, · · · , N − 1 where x t+k+1|t is the prediction of x t+k+1 at time t, l(x t+k|t , u t+k|t ) a stage cost function penalizing the state and input over the prediction horizon N , l f (x t+N |t ) approximates the cost of regulating the state x t+N |t from time step N to infinity and (3) are lower and upper bounds for the state and input, respectively, where H x ∈ R nxc×nx , h x ∈ R nxc , H u ∈ R nuc×nu , h u ∈ R nuc for n xc state and n uc input constraints.

B. Gaussian Processes
Assume an unknown continuous-time function g(t i ) with inputs t i and noisy outputs y i where v i denotes white Gaussian measurement noise such that where t in this work will denote time. Gaussian Process Regression is concerned with the task of learning the function g(t i ) such that predictions of the output y n+1 can be made based on an input t n+1 and a training set (t 1 , y 1 ), . . . , (t n , y n ). Therefore it is assumed that the output data y 1:n+1 forms a multivariate normal distribution such that y 1:n+1 ∼ N (µ, K), where µ ∈ R n+1 and the entries of K ∈ R n+1,n+1 at any element K i,j are given by the kernel function k(t i , t j ); the sequence y 1:n+1 here denotes a column vector of length n + 1. Thus the correlation of any two y i and y j is given by the kernel function and the input data t i , t j . A prediction of the unknown output y n+1 is obtained from the multivariate normal distribution by marginalizing over y 1:n . Since normally distributed predictions y n+1 are obtained for arbitrary input values t n+1 this procedure yields a distributed function y n+1 =ĝ(t n+1 ) called a Gaussian Process. The mean µ defines a bias and is often assumed zero.
The kernel which determines the data points' correlation is usually parameterized by a set of hyperparameters. These hyperparameters are fitted to the data such that the kernel function yields a most likely description of the data points true correlation. More formally, the optimal hyperparameters are determined by max θ p(θ|t 1:n , y 1:n ), where θ is a vector of the sought hyperparameters. In practice, one may use gradient descent methods to maximise the (log) likelihood of the optimization problem (4) and update the GP hyperparameters iteratively as θ n+1 = θ n + η dp(θ|t 1:n , y 1:n ) dθ where η denotes a learning rate. The most widely used kernel, the squared exponential kernel k(t, t ′ ) = σ 2 m exp(−(t − t ′ ) 2 /l 2 ) with hyperparameters l and σ 2 m , assumes a high correlation when the data points t and t ′ are close, thus, the corresponding output data points are likely to be similar. Therefore, it is often used for learning smooth functions. By varying the width of the squared exponential kernel, i.e., l, one can adjust the input range over which the output data is assumed to be similar.
The inference, or more precisely, marginalization in GPs, can be done in closed form as well as the calculation of the gradient in (5). However, both require the inversion of the covariance matrix K, which yields a computational complexity of O(n 3 ). Thus, with increasing data, GPs become intractable for real-time applications.
A solution to this problem was proposed in [8] by reformulating various GPs with stationary kernels k(t j , The transformation is done as follows: According to the Wiener-Khinchin theorem the spectral density S(iω) ofĝ(t) can be obtained by the Fourier-Transform of the stationary kernel function S(iω) = F {k(τ )}. Assuming a finite spectrum of S(iω), a consecutive spectral factorization F {k(τ )} = H(iω)qH(iω) yields a system with the transfer function H(iω) driven by white Gaussian noise with spectral density q which yields an output with the same spectral density asĝ(t). Thus, the system driven by the noise can be written as a continuous LTI-system with a discrete outpuṫ where z ∈ R nz denotes the state vector of the GP state-space model, w(t) ∼ N (0, q),v(t i ) ∼ N (0,σ 2 n ) represents measurement noise and the matrices F, L, H are of appropriate dimension. The system yields a predictionŷ(t i ) for arbitrary inputs t i based on the state z(t i ).
For the inference, the system is discretized at a sampling distance ∆T . Given a record of past outputs y 1:n , the state z n+1 ∼ (m z n+1 , Σ z n+1 ) can be inferred by Kalman-Filtering, which yields a prediction of the output y n+1 ∼ N (Cm z n+1 , CΣ z n+1 C ⊤ +σ 2 n ). For the learning, the gradient in (5) can be obtained efficiently by refactoring results of the inference [8]. Finally, it is worth mentioning that F {k(τ )} does not always yield the proposed rational form for arbitrary kernels, e.g., the squared exponential kernel. In such cases, it can be approximated to a desired order using a Taylor series or a Padé approximation, see [4] for more details.

III. MODELING
The quadcopter position is denoted as p ∈ R 3 and velocitẏ p ∈ R 3 in an inertial frame which is spanned by unit vectors ex, eŷ, eẑ ∈ R 3 . Furthermore, the attitude between the inertial frame and the body frame, which is aligned with the quadcopter axes along the unit vectors e x , e y , e z ∈ R 3 , is given by the Euler angles Φ ∈ R 3 and attitude ratė Φ ∈ R 3 in the roll, pitch and yaw directions. The matrix R rot ∈ R 3×3 denotes the rotation matrix from the body frame to the inertial frame.
The quadcopter system is modeled as a rigid body with three rotational and three translational DoF and dynamics p = F/m, M = Jω+ω×Jω where m ∈ R is the quadcopter mass, g = 9.81 m s −2 , ω ∈ R 3 denotes the angular rate of the quadcopter in the body frame, F ∈ R 3 the sum of all forces and M ∈ R 3 the sum of all moments which act on the quadcopter body.
The force F is a summed effect of gravity, the propeller actuation, and aerodynamics, which, for simplicity, are assumed to scale linearly with the quadcopter velocity such that the thrust generated by a single propeller i and k D ∈ R the drag force coefficient.
The moment M = τ x τ y τ z ∈ R 3 also stems from the actuation of the quadcopter's propellers and allows the quadcopter to roll, pitch and yaw. Moments caused by blade flapping (cf. [9]) are neglected in this model for simplicity.
In order to derive a linear model for the quadcopter a state x = p ⊤ṗ⊤ Φ ⊤Φ⊤ ⊤ ∈ R 12 and normalized inputs the longitudinal, lateral and vertical torques and 1]. The state and input dimensionality are denoted with n x = 12 and n u = 4, respectively.
Linearizing the nonlinear dynamics around the hovering conditionẋ s = 0 and u s = mg Tmax where The wind and the tether will act as latent forces and moments on the quadcopter body. Assuming the tether is attached to the center of mass, we will only focus on latent forces. However, the proposed procedure can easily be extended to include latent moments.

IV. CONTROLLER DESIGN
GPs are used here to predict the latent forces caused by the disturbances over the prediction horizon of the MPC. This information is then used to improve the MPC performance and guarantee constraint satisfaction up to probability levels p x and p u for the state and input, respectively. The control policy is thus to find an input sequence u 1:N for the stochastic optimization problem min u0,··· ,uN−1 where E denotes the expected value, Q = Q ⊤ 0, P = P ⊤ 0 and R = R ⊤ ≻ 0, the norm x A = x ⊤ Ax denotes the matrix norm and d t+k|t represents a prediction at time t of the disturbance at time step t + k. While Q and R are used for tuning the MPC behavior, the matrix P weights the terminal cost. Note that computing the above problem is intractable, therefore, simplifications will be introduced.

A. Disturbance extraction
The latent forces in all DoFs produce accelerations in the different directions of the quadcopter body. In order to train GPs to learn the dynamics of these accelerations, it is first necessary to extract and record data of past accelerations due to the disturbances. This can be performed by comparing measured velocitiesṗ t|t of the quadcopter with simulated velocitiesṗ t|t based on the proposed nonlinear quadcopter dynamics. Therefore, the accelerations in the different directions can be estimated bÿ where i ∈ {x,ŷ,ẑ} denotes the direction of the acceleration and ∆T the discretization length of the state prediction. The recorded accelerationsp d,i,0|t , . . . ,p d,i,t−1|t in all DoFs are assumed to be uncorrelated since they stem from unknown disturbance dynamics and since the translational dynamics of the quadcopter can be separated into the single DoFs. Thus, the inference and learning in each DoF can be handled independently by a single one-dimensional GP such that three GPs are assigned to predict the accelerations associated with the disturbances. The three GPs to learn and predictp d,i (t) = g i (t) with i ∈ {x,ŷ,ẑ} will be used in their continuous-time state-space representation with the deterministic formulatioṅ

B. Predictor
To define the predictor of the MPC, we combine the model of the quadcopter system and the deterministic state-space models of the GPs (9), which results in the augment model where A, B are obtained from the quadcopter model in (7) and the matrices Cx, Cŷ, Cẑ ∈ R nx are column vectors mapping the predicted accelerations by the three GPs into the respective dimension of the quadcopter's state. Then, discretizing the augmented model above yields the predictor model of the MPC, which directly incorporates the GP's disturbance predictions during state propagation.
In this work, we adopt the exact discretization approach [6]. Note that the discrete-time form of (10) representŝ f (x t+k|t , u t+k|t , d t+k|t ) in (8).

C. Uncertainty propagation and constraints tightening
To incorporate the uncertainty of the GP predictions in the MPC formulation, we make use of some formulations presented in [5]. We consider the discrete-time representation of (6) as n,i ) are the discrete time representatives of the continuous time variables in (6) and i ∈ {x,ŷ,ẑ}. Therefore, the mean value and the covariance of the state of the GPs can be propagated as follows: and for the output of the GPs, they arē with i ∈ {x,ŷ,ẑ}.
The normal distribution of the GP predictions yields the quadcopter state stochastic such that x t+k|t ∼ N (x t+k|t , Σ x t+k|t ), where the quadcopter state evolves according to the nominal system in (7) plus the additive disturbances Σ d i,t+k|t . Therefore, using (11) and (12), the covariance of the quadcopter state propagates as Since the quadcopter system is unstable and the MPC only stabilizes the nominal system based on its deterministic predictor model, the evolution of the covariance of the quadcopter state may diverge over the prediction horizon according to (13). To account for this effect, an LQR controller K ∞ is used. The input then reads as whereū t+k|t is computed by the MPC. Then, the growth of the uncertainty will be restricted as The feedback of the stochastic state in (14) yields the control input stochastic with the distribution u t+k|t ∼ N (ū t+k|t , KΣ x t+k|t K ⊤ ). The stochastic nature of the state and input at any time step can be described by defining uncertainty regions S x t+k|t and S u t+k|t , respectively, around their expected values within which their uncertain values lie up to some probability level. These regions can then be subtracted from the original constraint sets X and U to obtain tightened setsX t+k|t = X ⊖ S x t+k|t andŨ t+k|t = U ⊖ S u t+k|t , where ⊖ denotes the Pontryagin set difference, which are used to define guarantees on the satisfaction of the chance constraints in (8) [5]. If the state/input lies within these tightened sets, then, up to the probability level, the quadcopter state/input will not exceed the original constraints despite the unknown disturbances. For the applied half-space constraints (3), the tightened constraint sets can be computed as [5] where diag indicates a vector of the matrix diagonal elements, Φ −1 is the quantile function of a standard normal distribution, where 0 ≤ p x ≤ 1 is some probability level and Σ u t+k|t = KΣ x t+k|t K ⊤ , see [5] for more details. Finally, the MPC optimization problem is given as follows: where the predictor model withÃ,B is obtained from discretizing (10).

D. Algorithm
The proposed GP-based MPC for quadcopters is finally implemented as follows and sketched in Algorithm 1. The quadcopter first flies based on a nominal MPC while the disturbance data are collected for the GPs as described in section IV-A. The MPC is updated at intervals of T = 100 ms; between the MPC updates, the GP hyperparameters are trained based on the available data as described in section II-B. At the end of each time interval, it is not guaranteed that the GP hyperparameters will converge to local optima. However, assuming that the disturbance dynamics do not change arbitrarily fast, they likely converge after certain number of iterations as was demonstrated in [10].
Once the GPs experienced "sufficient" training, the algorithm switches from the nominal MPC to the GP-based MPC, thus including the respective disturbance predictions. The term "sufficient" is still an open research question. In the current implementation, the switching simply happens after the GP hyperparameters have been updated at least 50 times using (5). In our experience, the hyperparameters typically converged in simulation flights at this point. Quadcopter ← u * t|t + u hover 0 0 0 Update GPs with collected d 0:t 17 t ← t + 1

V. IMPLEMENTATION AND RESULTS
In this section, we discuss the implementation of Algorithm 1 and we demonstrate the results. The algorithm has been implemented in C++ on a Raspberry Pi 4 B in order to evaluate the computation times. The same implementation has then been tested in simulation as an offboard controller for the PX4 autopilot using the jMAVSim simulation environment. The used quadcopter parameters are depicted in table II, the thrust-input necessary during hovering u hover = 0.3 and the MPC parameters are given in Table I. The LQR in (14) uses the same matrices Q, R for its cost function as the MPC. The MPC's predictor model has been discretized at 10 Hz while it has been recomputed with a rate of 30 Hz, which rendered a better flight performance during the simulation. The optimization problem has been solved using the C++-OSQP library [11] and the osqpeigen interface 1 . In order to connect to the PX4 Autopilot firmware and send offboard commands from the C++ script, the MAVSDK-library 2 has been used. The squared exponential kernel has been considered for all the GPs, which are computed as LTI systems. Therefore, Fourier-Transform of the squared exponential function has been brought into rational form via a sixth-order Taylor approximation as proposed in [8]. The GP hyperparameters are continuously updated as described in section IV-D based on the last 50 recorded data samples and a learning rate of η = 0.03 0.01 0.005 ⊤ , see (5).
Next, the implementation has been tested on a potent host machine in order to control a quadcopter within the jMAVSim simulation environment [12] while the respective computation times that a Raspberry Pi 4 B would have required to compute the algorithm have been imitated via sleep commands. The goal was to track the reference trajectory depicted in Fig. 1. The resulting trajectories of the simulated flights under heavy wind conditions using a nominal and the GP-based MPC are depicted in Fig. 1. The jMavSim environment models wind by filtering white gaussian noise (variance set to set to 24 m s −1 ) and adding a mean wind speed vector (length set to to 22 m s −1 ) on top. However, various wind speed settings have been tested with similar results. The takeoff is performed by an automated sequence of the PX4 autopilot. The nominal and GP-based MPC take over midflight and try to track the depicted trajectory. Finally, the quadcopter lands, again using an automated sequence of the PX4 autopilot. Notably, the GP-based MPC performs significantly better in this task with a root-mean-square (RMS) euclidean distance of 0.82 m to the reference trajectory compared to the nominal MPC with an RMS euclidean distance of 2.83 m to the reference trajectory.
Furthermore, it can be seen that the GPs tend to introduce oscillations, especially in the z-coordinate. Interestingly, oscillations occur in the z-coordinate when the quadcopter changes its y-position. This is likely the cause of a mismatch between the nonlinear model and the true system dynamics, which overlies the disturbance estimates and may lead to undesired feedback dynamics between the GPs and the MPC. This indicates, that the proposed method relies on precise model predictions.  These oscillations, as well as a more aggressive behaviour of the GP-based MPC compared to a conventional MPC can also be seen when analyzing the computed input signals. During the flight, the GP-based MPC computed oscillating actuation signals with a higher amplitude (see Fig. 2).
The C++ implementation in this work proved the feasibility of the proposed control approach for real-time applications: On a Raspberry Pi 4 Model B, computing the nominal MPC took about 10 ms while the computation of the MPC with GPs took about 20 ms. Updating the hyperparameters of a single GP with a batch size of 50 input-output data pairs required up to 2 ms.

VI. CONCLUSIONS
In this paper, the applicability of a GP-based MPC for predicting disturbances acting on a quadcopter and reducing their effects has been demonstrated in simulation and a practical implementation framework has been presented. It has been shown that this approach yields superior performance over a conventional MPC when disturbances affect the system and that the algorithm scheme is computable in real-time.
In order to verify the potentials that this approach shows in simulation and to prove its robustness, its algorithm will be evaluated on indoor flight tests with a focus on reproducibility. As future research directions: the computational complexity of the GP-based MPC could be further reduced by applying Infinite Horizon Gaussian Processes [10], which may also remove artefacts introduced by the truncation of the batch data. An interesting research issue is to define an appropriate classifier that allows to determine when one should switch from the nominal to the GP-based MPC. Moreover, to deal with model mismatch effects, the proposed GP-based MPC could be combined with a preceding learning of such mismatches as shown in [13]. A more complex quadcopter model for the MPC and disturbance estimation will be used in future investigations.
ACKNOWLEDGMENT Finally, we would like to thank David Pysik, Tavia Plattenteich, and Ievgen Zhavzharov for their valuable input and support with the PX4 autopilot.