Modified Munich Chain-Ladder Method

The Munich chain-ladder reserving method was introduced on an axiomatic basis. We analyze these axioms and we define a modified Munich chain-ladder reserving method which is based on an explicit stochastic model. This stochastic model then allows to consider claims prediction and prediction uncertainty for the Munich chain-ladder reserving method in a consistent way.


Introduction
The Munich chain-ladder method was introduced by Quarg and Mack [6] on a pure axiomatic basis, and in 2003 it was awarded the Gauss prize by DAV and DGVFM, see [6]. But still today it is not known whether there is a non-trivial interesting stochastic model that fulfills these axioms, nor is anything known about the prediction uncertainty in the Munich chain-ladder method. The aim of this paper is to study the axioms of the Munich chain-ladder method and to define a modified Munich chain-ladder method which is based on an explicit stochastic model. For this modified version we analyze claims prediction and its uncertainty. There are two different ways to view the Munich chain-ladder method. The first way is to define a stochastic model which has the required structure of the Munich chain-ladder factors; this is the approach taken in [6]. The second way is to define a general chain-ladder model and derive estimators that have the Munich chain-ladder factor structure; this is the approach taken in [4]. Here we analyze both of these views and we show how the second way leads to a modified Munich chain-ladder method. The first main result is that within the family of multivariate normal models, there is, in general, no interesting model which fulfills the Munich chain-ladder model assumptions, see Theorem 4.3 below. Therefore, the Munich chain-ladder predictor always has an approximation error which is quantified in Theorem 4.4 below. Based on these findings, we define a modified Munich chain-ladder model for which we can derive optimal predictors and the corresponding prediction uncertainty.
Organization of the paper. In the next section we consider stochastic models which simultaneously fulfill the chain-ladder assumptions for cumulative payments and claims incurred. In Theorem 2.2 we will see that such models only allow for rather restricted correlation structures. For these restricted chain-ladder models we then study the optimal one-step ahead prediction in Section 3. This optimal one-step ahead prediction can then directly be compared to the Munich chain-ladder axioms which are introduced in Section 4. In Theorem 4.3 we find that, in general, the Munich chain-ladder axioms are not fulfilled. This leads to a modified Munich chain-ladder method which is presented in Section 5. For this modified version we derive optimal predictors and study prediction uncertainty in Section 6. These results are then compared numerically to other methods in Section 7. This numerical study is based on the original data set of Quarg and Mack [6].

Chain-ladder models
We denote cumulative payments of accident year i and development year j by P i,j and the corresponding claims incurred are denoted by I i,j for i = 0, . . . , J and j = 0, . . . , J. We define the following sets of information Assumption 1 (distribution-free chain-ladder model).
(A2) There exist parameters f P j , f I j , (σ P j ) 2 > 0 and (σ I j ) 2 > 0 such that for 0 ≤ j ≤ J − 1 and 0 ≤ i ≤ J we have E P i,j+1 | B P j = f P j P i,j and Var P i,j+1 | B P j = (σ P j ) 2 P 2 i,j , E I i,j+1 | B I j = f I j I i,j and Var I i,j+1 | B I j = (σ I j ) 2 I 2 i,j .
These assumptions correspond to PE, PV, IE, IV and PIU in [6], except that we make a modification in the variance assumptions PV and IV. We make this change because it substantially simplifies our considerations (but it does not harm our argumentation). Assumption 1 states that cumulative payments (P i,j ) i,j and claims incurred (I i,j ) i,j fulfill distribution-free chainladder model assumptions simultaneously. Our first aim is to show that there is a non-trivial stochastic model that fulfills the chain-ladder model assumptions simultaneously for cumulative payments and claims incurred. To this end we define an explicit distributional model. The distributions are chosen such that the analysis becomes as simple as possible. We will see that assumption (A2) requires a sophisticated consideration. Choose a continuous and strictly increasing link function g with range R. The standard example that we use in the sequel is the log-link given by For given link function g we define the transformed age-to-age ratios for 0 ≤ j ≤ J and 0 ≤ i ≤ J by where we set fixed initial values P i,−1 = I i,−1 = ν i according to given volume measures ν i > 0.
To simplify the outline we introduce vector notation, for 0 ≤ i ≤ J we set Assumption 2 (multivariate (log-)normal chain-ladder model I).
(B1) We assume that the random vectors Ξ i are independent for different accident years i = 0, . . . , J.
With Lemma 2.1 we can calculate the conditionally expected claims for given link function g.
We have for 0 ≤ j ≤ J − 1 and 0 ≤ i ≤ J We have assumed that Σ is positive definite. This implies that also (Σ * [j] ) −1 is positive definite for * ∈ {P, I}. We then see from Lemma In that case we have for the log-link g(x) = log x and for 0 ≤ j ≤ J − 1 and 0 ≤ i ≤ J Analogous statements hold true for claims incurred I i,j+1 , conditioned on B I j .
We see that under the assumptions of Theorem 2.2 the process (P i,j ) 0≤j≤J has the Markov property, and we obtain chain-ladder parameters for the log-link g(x) = log x with * ∈ {P, I}. Moreover, the covariance matrix Σ under Theorem 2.2 is given by for an appropriate matrix A ∈ R (J+1)×(J+1) such that Σ is positive definite. The matrices S * [J] are called Schur complements of Σ * [J] in Σ, for * ∈ {P, I}. One may still choose more structure in matrix A = (a k,l ) 0≤k,l≤J , for instance, a lower-left-triangular matrix is often a reasonable choice, i.e. a k,l = 0 for all k < l. For the time-being we allow for any matrix A such that Σ is positive definite. This leads to the following model assumptions.  The previous corollary states that we have found a class of non-trivial stochastic models that fulfill the distribution-free chain-ladder assumptions simultaneously for cumulative payments and claims incurred. Note that a reasonable choice of matrix A in (2.5) allows for dependence between cumulative payments and claims incurred, this will be crucial in the sequel.
3 One-step ahead prediction Theorem 2.2 provides the best prediction of P i,j+1 based on B P j and the best prediction of I i,j+1 based on B I j , respectively, under Assumption 3. The idea in the Munich chain-ladder method is to consider best predictions based on both sets of information B j = B P j ∪B I j . This is similar to the considerations in [4]. In this section we start with the special case of "one-step ahead prediction", the general case is presented in Section 6, below. We denote by θ [j] = (θ P 0 , . . . , θ P j , θ I 0 , . . . , θ I j ) ∈ R 2(j+1) and let Σ [j] ∈ R 2(j+1)×2(j+1) be the (positive definite) covariance matrix of the random vector ξ i,[j] = (ξ P i,0 , . . . , ξ P i,j , ξ I i,0 , . . . , ξ I i,j ) . Moreover, let Σ ( * ) j,j+1 ∈ R 2(j+1) denote the covariance vector between ξ i,[j] and ξ * i,j+1 for * ∈ {P, I}. Note that in contrast to Lemma 2.1 we replace Σ * j,j+1 by Σ ( * ) j,j+1 , i.e. we set the upper index in brackets.
with (s ( * ),post j+1 Proof. This is a standard result for multivariate Gaussian distributions, see Result 4.6 in [3].

2
The previous lemma shows that the conditional expectation of ξ P i,j+1 , given B j , is linear in the observations ξ i, [j] . This will be crucial. An easy consequence of the previous lemma is the following corollary for the the log-link.
Corollary 3.2 (one-step ahead prediction for log-link). Under Assumption 3 we have prediction for log-link g(x) = log x and for 0 ≤ j ≤ J − 1 and Analogous statements hold true for claims incurred I i,j+1 .
gives the correction if we experience not only B P j but also B I j . This increased information leads also to a reduction of prediction uncertainty of size Example 3.3 (log-link). The analysis of the correction term γ P j (ξ i,[j] ) is not straightforward. Therefore, we consider an explicit example for the case J = 2 and j = 0, 1. In this case the covariance matrix Σ under Assumption 3 is given by • Case j = 0. We start the analysis for j = 0, i.e. given information B 0 .
Moreover, Σ (P ) 0,1 = (0, a 1,0 ) . This provides credibility weight (α P [0] ) ∈ R 2 given by Observe that a 1,0 = Cov(ξ P i,1 , ξ I i,0 ) is the crucial term in the credibility weight α P [0] . If these two random variables ξ P i,1 and ξ I i,0 are uncorrelated, then a 1,0 = 0 and we cannot learn from observation ξ I i,0 to improve prediction ξ P i,1 . The predictor for log-link g(x) = log x is given by Remarkable is that also observation ξ P i,0 is used to improve prediction of ξ P i,1 , though these two random variables are uncorrelated under Assumption 3. This comes from the fact that if a 0,0 = 0 then ξ P i,0 is used to adjust ξ I i,0 .
We have the following inverse matrix for Σ [1] , see Appendix B for the full inverse matrix, Moreover, Σ (P ) 1,2 = (0, 0, a 2,0 , a 2,1 ) is the covariance vector between ξ i, [1] and ξ P i,2 . This provides credibility weight (α P We again see that the crucial terms are a 2,0 = Cov(ξ P i,2 , ξ I i,0 ) and a 2,1 = Cov(ξ P i,2 , ξ I i,1 ). If these two covariances are zero then claims incurred observation is not helpful to improve prediction of ξ P i,2 . Therefore, we assume that at least one of these two covariances is different from zero. The predictor for the log-link g(x) = log x is given by Again ξ P i,0 and ξ P i,1 are used to adjust ξ I i,0 and ξ I i,1 through a 0,0 , a 0,1 and a 1,0 , a 1,1 , respectively, which are integrated into c 0,0 , c 0,1 and c 1,0 , c 1,1 , respectively.

Munich chain-ladder model
In Corollary 3.2 we have derived the best prediction under Assumption 3 for the log-link. Note that this best prediction is understood relative to the mean-square error of prediction and it crucially depends on the choice of the link function g. Since this model also fulfills the chainladder model Assumption 1 for any link function g, see Corollary 2.4, it can also be considered as the best prediction for given information B j in the distribution-free chain-ladder model (for the chosen link function g). The Munich chain-ladder method takes a different viewpoint in that it extends the distribution-free chain-ladder model Assumption 1 and then derives prediction under these additional assumptions. We will define this extended model in Assumption 4, below, and then study under which circumstances our distributional model from Assumption 3 fulfills these Munich chain-ladder model assumptions. Define the residuals The Munich chain-ladder assumptions of Quarg and Mack [6] are given by: Assumption 4 (Munich chain-ladder model). Assume in addition to Assumption 1 that there exist constants λ P , λ I ∈ (−1, 1) such that for 0 ≤ j ≤ J − 1 and and The tower property for conditional expectations Therefore, Assumption 4 does not contradict Assumption 1. We now analyze Assumption 4 from the viewpoint of the multivariate (log-)normal chain-ladder model of Assumption 3. We therefore need to analyze the correction term defined in the Munich chain-ladder model and compare it to the optimal correction term obtained from Lemma 3.1. We start with log-link g(x) = log x and then provide the general result in Theorem 4.3, below. For the log-link we have Therefore, for ε I|P i,j we need to determine the conditional distribution of j l=0 ξ I i,l , given ξ P i, [j] .
with covariance vector a I 0:j = ( j l=0 a 0,l , . . . , j l=0 a j,l ) ∈ R j+1 for A = (a k,l ) 0≤k,l≤J , and posterior variance (s I,post Proof. This is a standard result for multivariate Gaussian distributions, see Result 4.6 in [3]. 2 Example 4.2 (log-link). We consider log-link g(x) = log x. In this case we have from (4.2) and using Lemma 4.1 for the residual of the correction term This implies for the Munich chain-ladder model Assumption 4, we also use (2.4), with Munich chain-ladder correction factor defined by  We analyze this Munich chain-ladder correction factor for j = 1. It is given by We compare this to the best prediction in the case j = 1 characterized by (3.1) and under the additional assumptions that a 2,0 = 0 and a 2,1 = 0. In this case we obtain from (3.1) and (2.4) correction term .
Note that this differs from (4.4). This can, for instance, be seen because all terms in the sum [1] ) are equally weighted, whereas for the best predictor we consider a weighted sum [1] ). We conclude that the Munich chain-ladder estimator in general is non-optimal under Assumption 3.
The (disappointing) conclusion from Example 4.2 is that within the family of models fulfilling Assumption 3 with log-link g(x) = log x there is no interesting example satisfying the Munich chain-ladder model Assumption 4. Exceptions can only be found for rather artificial covariance matrices Σ, for instance, a choice with A = 0 would fulfill the Munich chain-ladder Assumption 4. But this latter choice is not of interest because it requires λ P = λ I = 0. This result can be generalized to any link function as the next theorem shows. Proof. The optimal one-step ahead prediction for given link function g is given by, see also Lemma 3.1, From the latter we observe that observation ξ I i, [j] is considered in a linear fashion c ξ I i, [j] for an appropriate vector c ∈ R j+1 , which typically is different from zero (for A = 0) and which does not point into the direction of (1, . . . , 1) ∈ R j+1 , i.e. we consider a weighted sum of the components of ξ I i,[j] (with non-identical weights).
On the other hand, the correction terms from the Munich chain-ladder assumption for a given link function g are given by, see also (4.1), Thus, the only link function g which considers the components of ξ I i, [j] in a linear fashion is the log-link g(x) = log x. For the log-link we get From this we see that all components of ξ I i, [j] are considered with identical weights, and therefore it differs from the optimal one-step ahead prediction (if the latter uses non-identical weights). This is exactly what we have seen in Example 4.2. 2 In Theorem 4.1 of [4] the Munich chain-ladder structure has been found as a best linear approximation to E [P i,j+1 | B j ] in the following way where L(B P j ) is the space of B P j -measurable random variables. Note that this approximates the exact conditional expectation E [P i,j+1 | B j ] and it gives an explicit meaning to parameter λ P ∈ (−1, 1) (which typically is non-constant in j), see also Section 2.2.2 in [6].
Proof. This proof follows from Example 4.2. 2

The modified Munich chain-ladder method
In the sequel we concentrate on the model of Assumption 3 with log-link function g(x) = log x. This provides the chain-ladder model specified in Theorem 2.2 and the one-step ahead prediction given in Corollary 3.2. The issues that we still need to consider are the following: (i) We would like to extend the one-step ahead prediction to get the ultimate claim prediction, i.e. prediction of all future periods. (ii) Typically, model parameters are not known and need to be estimated. (iii) We should specify the prediction uncertainty. In order to achieve these goals we choose a Bayesian modeling framework.

Assumption 5 (modified Munich chain-ladder model).
Choose the log-link g(x) = log x and assume the following: There is a fixed covariance matrix Σ of the form (2.5) given having positive definite Schur complements S P [J] and S I [J] .
An easy consequence of Lemma 5.1 is the following marginal distribution This shows that in the Bayesian multivariate normal model with Gaussian priors we can completely "integrate out" the hierarchy of parameters Θ. However, we will keep the hierarchy of parameters in order to obtain Bayesian parameter estimates for Θ. Denote the dimension of ζ by n = 2(J + 1) 2 + 2(J + 1). Choose t, v ∈ N with t + v = n. Denote by P t ∈ R t×n and P v ∈ R v×n the projections such that we obtain a disjoint decomposition of the components of ζ The random vector (ζ t , ζ v ) has a multivariate Gaussian distribution with expected values and with covariance matrices The projections in (5.1) only describe a permutation of the components of ζ. In complete analogy to Lemma 2.1 we have the following lemma.
Lemma 5.2. Under Assumption 5 the random vector ζ v | {ζ t } has a multivariate Gaussian distribution with the first two conditional moments given by This lemma now allows for parameter estimation and prediction at time J, conditionally given observations

Claims prediction and prediction uncertainty
For the prediction of the ultimate claim we have two different possibilities, either we predict the ultimate claim of cumulative payments P i,J or the ultimate claim of claims incurred I i,J . Naturally, these two predictors will differ, unless we make a similar (additional) assumption as in [5]. We refrain from making an additional assumption in order to keep the predictor comparable to the Munich chain-ladder method of [6]. Assume we have chosen the log-link g(x) = log x, then we need to calculate for i = 1, . . . , J Assume again that ζ t exactly corresponds to the observations in D J . Then we define for i = 1, . . . , J and * ∈ {P, I} the linear maps This is the sum of the unobserved components of accident year i at time J for cumulative payments and claims incurred, respectively.
and analogously for claims incurred ).
This can now again be compared to the individual predictors and the corresponding conditional mean-square errors of prediction. Note that these individual predictors correspond to the predictors in the model of [2] under Gaussian prior assumption for the (unknown) mean parameters. Predictors and prediction uncertainty of (6.1) can (easily) be obtained from Theorem 6.1 using the particular choice A = 0 in Σ.

Example
We provide an explicit example for which we calculate the chain-ladder (CL) reserves according to (6.1), the reserves in the modified Munich chain-ladder (mMCL) method of Theorem 6.1, the (non-optimal) Munich chain-ladder (MCL) reserves (according to Assumption 4), as well as the paid-incurred chain (PIC) reserves derived in [5]. In order to have comparability between these different methods we choose for each method a Bayesian framework with non-informative priors for the mean parameters. We choose the original data of Quarg and Mack [6], they are provided in the appendix. We also provide the choices of s P j and s I j for the log-link g(x) = log x in the appendix (for these parameters we simply choose the sample standard deviations with the usual exponential extrapolation for the last period j = 6). We then calculate the chain-ladder parameters f P j and f and (s * j+1 ) 2 by (s * j+1 ) 2 (1 + 1/(J − j)), the latter being the posterior variance parameters in the non-informative prior case for A = 0 in Σ. We also provide these numerical values in the appendix. From these parameters we can then calculate the chain-ladder reserves from the chain-ladder predictors (6.1), which are defined by The results are provided in Table 1. The main observation is that there are quite substantial differences between the chain-ladder reserves from cumulative payments R CL,paid i and the ones from claims incurred R CL,inc i , see Table 1 We start with the Munich chain-ladder method of [6] with changed variance functions according to our Assumption 1 (A2). These changes of the variances also lead to slightly different parameter estimates compared to [6]. For the correlation parameters defined in Assumption 4 we obtain estimates λ P = 49% and λ I = 45% (if we use the estimators of Section 3.1.2 in [6] with changed variance functions). Using these estimates we can then calculate the reserves in the Munich chain-ladder method. As in [6] we obtain the two values R MCL,paid i and R MCL,inc i for the reserves based on cumulative payments with claims incurred corrections ε I|P i,j and for the ones based on claims incurred with cumulative payments corrections ε P |I i,j , respectively (see also Assumption 4). The results are provided in Table 1. We observe that the gap between cumulative payment reserves and claims incurred reserves becomes more narrow due to the correction factors. Both reserves were derived under model Assumption 5 and, henceforth, are non-optimal within this model (as has been seen in Theorems 4.3 and 4.4). Moreover, there is no sensible estimate for the prediction uncertainty. Therefore, we study the optimal estimators within Assumption 4 next. We calculate the reserves in the modified Munich chain-ladder method of Assumption 5, see Theorem 6.1. We therefore need to specify the off-diagonal matrix A = (a k,l ) 0≤k,l≤J , see (2.5). A first idea to calibrate this matrix A is to use correlation estimates λ P = 49% and λ I = 45% from the Munich chain-ladder method. A crude approximation using Theorem 4.4 provides From this we see that in our numerical example we need comparatively high correlations, for instance, Corr(ξ P i,j+1 , ξ I i,k ) ≥ 40% would be in line with λ P = 49%. The difficulty with this choice is that the resulting matrix Σ of type (2.5) is not positive definite! Therefore, we need to choose smaller correlations. We do the following choice for all i, j ≥ 0 40% for m = 0, 30% for m = 1, 20% for m = 2, 10% for m = 3, and 0% otherwise. This provides a positive definite choice for Σ of type (2.5) in our example. This choice means that we can learn from claims incurred observations ξ I i,j for cumulative payments observations ξ P i,j+m with development lags m = 0, 1, 2, 3, but no other conclusions can be drawn from observations. The resulting modified Munich chain-ladder reserves according to Theorem 6.1, are then provided in Table 1. We observe that the reserves R mMCL,paid i based on cumulative payments are closer to the claims incurred reserves R CL,inc i . This is due to correlation choices (7.1). On the other hand, claims incurred reserves remain almost constant, i.e. R CL,inc i ≈ R mMCL,inc i . This is due to the fact that a k,l = 0 for k < l and therefore cumulative payments observations have only a minor influence (via parameter estimation) on claims incurred reserves. If we want to completely close the gap between cumulative payment reserves and claims incurred reserves, we need to make an additional assumption in Assumption 5. This additional assumption can be of similar nature as the one in the paid-incurred chain reserving method presented in [5], namely P i,J = I i,J , P-a.s. If this assumption is made, then ultimate claims are identical, P-a.s., and there is only one reserve R PIC i which is based on the entire information D J = D P J ∪ D I J , see [5]. The numerical result of paid-incurred chain model of [5] is provided in the last column of Table 1. Finally, we analyze the prediction uncertainty measured by the square-rooted conditional mean square error of prediction. The results are provided in Table 2  of the chain-ladder reserves and of the modified Munich chain-ladder reserves were calculated according to Theorem 6.1. For the former (chain-ladder reserves) we simply need to set A = 0. We see that the uncertainties in the modified version for cumulative payments are reduced because correlations (7.1) imply that we can learn from incurred claims for cumulative payments. For claims incurred they remain invariant because of choices a k,l = 0 for k < l. We can now also calculate the prediction uncertainty for the Munich chain-ladder method (which is still an open problem). Within Assumption 5 we know that the modified Munich chain-ladder predictor is optimal, therefore, we obtain prediction uncertainty for the Munich chain-ladder method 2) and similarly for claims incurred. The second term in (7.2) is the approximation error because the Munich chain-ladder predictor is non-optimal within Assumption 5. Finally, we provide the prediction uncertainty in the paid-incurred chain method. In this case we fully benefit from cumulative payment data and claims incurred data, therefore the squarerooted conditional mean square error of prediction is roughly 976 ≈ 1 2 (1 249 + 1 565)/ √ 2 = 995. We have performed these 4 methods on various different data sets. It has turned out that the level of reserves depends in a rather sensitive way on the quality of claims incurred data. For instance, changes in the estimation philosophy of claims incurred data lead to diagonal effects in claims development triangles. These diagonal effects in claims incurred data often lead to unreasonable reserves in the (modified) Munich chain-ladder method and, therefore, in such cases one should either rely on cumulative payment data only or one should use other reserving methods such as the paid-incurred chain method of [5].

Conclusions
We have studied the Munich chain-ladder axioms of Quarg and Mack [6]. In Theorem 4.3 we conclude that, in general, chain-ladder models do not fulfill these axioms. Therefore, we introduce a modified Munich chain-ladder method which is fully consistent with our stochastic model assumptions. Within this new framework we derive best-estimate reserves and the corresponding prediction uncertainties. These results also allow to analyze the prediction uncertainty in the classical Munich chain-ladder method (which was still an open problem). Our concluding example proposes that the paid-incurred chain method of [5] provides more stable results compared to the (modified) Munich chain-ladder method.
A Data of Quarg and Mack [6] Observed cumulative payments Pi,j, i + j ≤ 6, and parameter choices.  The inverse matrix of Σ [1] is given by