Next Article in Journal
Deep Hedging under Rough Volatility
Previous Article in Journal
Machine Learning Applied to Banking Supervision a Literature Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Priori Ratemaking Selection Using Multivariate Regression Models Allowing Different Coverages in Auto Insurance

1
Department of Quantitative Methods, Faculty of Economics, University of Las Palmas de Gran Canaria, 35017 Las Palmas de Gran Canaria, Spain
2
Department of Economics, University of Melbourne, Melbourne 3031, Australia
*
Author to whom correspondence should be addressed.
Risks 2021, 9(7), 137; https://doi.org/10.3390/risks9070137
Submission received: 9 June 2021 / Revised: 12 July 2021 / Accepted: 14 July 2021 / Published: 20 July 2021

Abstract

:
A comprehensive auto insurance policy usually provides the broadest protection for the most common events for which the policyholder would file a claim. On the other hand, some insurers offer extended third-party car insurance to adapt to the personal needs of every policyholder. The extra coverage includes cover against fire, natural hazards, theft, windscreen repair, and legal expenses, among some other coverages that apply to specific events that may cause damage to the insured’s vehicle. In this paper, a multivariate distribution, based on a conditional specification, is proposed to account for different numbers of claims for different coverages. Then, the premium is computed for each type of coverage separately rather than for the total claims number. Closed-form expressions are given for moments and cross-moments, parameter estimates, and for a priori premiums when different premiums principles are considered. In addition, the severity of claims can be incorporated into this multivariate model to derive multivariate claims’ severity distributions. The model is extended by developing a zero-inflated version. Regression models for both multivariate families are derived. These models are used to fit a real auto insurance portfolio that includes five types of coverage. Our findings show that some specific covariates are statistically significant in some coverages, yet they are not so for others.

1. Introduction

In the automobile insurance sector, it is natural to calculate the a priori premium taking into account the number of claims and individual characteristics of each insured, such as gender, age, years of validity of the policy, etc. This procedure to compute the a priori premium is usually completed via parametric models rather than using the ordinary regression model, which can predict values of the number of claims even if negative. For this purpose, parametric models based on the use of the Poisson, negative binomial, and Poisson-inverse Gaussian distributions, among others, are the standard models considered in the univariate case. As of today, most insurance companies distinguish, apart from the total number of claims, individualized claims for different coverages, such as windscreen claims, thefts and fire claims, etc. So far, most actuarial models aim to differentiate only between two types of coverage when computing an appropriate premium based on different coverage. Perhaps one of the reasons for that is due to the lack of models capable of describing more than two coverages. The most often considered approach to tackle this problem is the one based on the bivariate Poisson distribution (see Bermúdez 2009; Bermúdez and Karlis 2017, among others). See also Gómez-Déniz (2016); Gómez-Déniz and Calderín-Ojeda (2018); Gómez-Déniz and Calderín-Ojeda (2020); Denuit et al. (2009), and Frees (2010) for more details related to this topic. Alternative references for a review of count regression are Cameron and Trivedi (1986, 1998); Winkelmann (2003), and Boucher et al. (2007). A copula-based correlated random effects model that accommodates dependence between claim frequency and severity was examined in Oh et al. (2020).
Traditionally, the business associated with insurance consists of selling risk coverage to buyers. In particular, in automobile insurance, the insurer provides financial protection against physical damage or bodily injury resulting from an incident (see Frees et al. 2016). However, it is common today, mainly due to the existing competition, that the insurance companies offer coverage of different claims within the same product not only to gain in competitiveness, but also to benefit from risk diversification and volatility. In this paper, we consider a motor vehicle insurance portfolio with policies observed during some time period that contain, apart from other known factors (gender, age, years of validity of the driver’s license, etc.), information about the claims number concerning different coverages that are considered as response variables. This includes windscreen, parking, theft and fire, etc.
Therefore, it is assumed that the insurance company collects information on the claims for these coverages and the total number of claims given by the sum of the claims in all the coverages. Thus, every policyholder generates a sequence of claims numbers for each coverage; one of them is the total claims number, which includes the sum of the coverages’ claims. Then, based on a conditional specification, a multivariate model that allows a simple way to describe the use of a finite but sufficiently large number of coverages is proposed. The resulting multivariate discrete distribution obtained enables us to study the dependence structure of a limited number of coverages in automobile insurance and include covariates such as gender, age, etc. We start by using a Poisson model for the random variable total claims number, and then by conditioning, we introduce the remaining variables in a branch architecture structure. Finally, closed-form expressions are given for parameter estimates, and a priori premiums are provided when different premium principles are used.
The purpose of this paper is to introduce a novel methodology based on a multivariate distribution via a conditional specification, proposed to account for different numbers of claims in different coverages and also for the total claims frequency. This approach enable us to examine the dependence structure of a finite number of coverages in motor vehicle insurance and also incorporate heterogeneity in the model through explanatory variables. Then, we use this procedure to calculate premiums based only on the claims frequency. Next, we show that the amount of claims can be incorporated into this multivariate model to derive multivariate claims’ severity distributions. For this, we assume that the claims size in the joint coverages follows a multivariate Erlang distribution. As multivariate probability distributions are complex, it is argued that analytical solutions are highly unlikely as compared to those derived under univariate and bivariate cases (see Cummins and Wiltbank 1983, 1984); nevertheless, in this work, we derive a multivariate model where the total number of claims that affect the portfolio is the result of the interaction of multivariate processes. The main advantage of the modelization presented in this work is that it avoids working with copulas (see, for instance, Balakrishnan and Lai 2009, chp. 1, p. 59). Although the copula approach for modeling multivariate models has been proven to be very useful, it has also been criticized due to the difficulty of choosing an appropriate copula structure and the complication of estimating the parameters that control the dependency. In addition, a multivariate zero-inflated model to account for the excess of common zeros in the empirical distribution is developed. Finally, these two multivariate distributions can be reparameterized to incorporate covariates to determine which factors and explanatory variables have an influence on the mean of the corresponding coverage. As an illustration, in this work, we use the French Motor Personal Line datasets available in the package “CASdatasets” in R, which include five response variables.
Although the modeling proposed here was developed ad hoc for the auto insurance market, it is unquestionable that other insurance lines in general insurance might benefit from it. For example, in home insurance, the whole premium could be split into different coverages such as moisture damage, theft, pipe repairs, locksmiths, and even protection against tenant rent default.
The rest of the paper is structured as follows. Section 2 describes the primary model and some of its properties. Then, premium calculations based on this basic model are discussed. Finally, a multivariate zero-inflated model and multivariate regression procedures are shown. Some methods of estimation are provided in Section 3. Next, a numerical application pertaining to a private motor French insurer is developed in Section 4. Finally, conclusions are drawn in Section 5.

2. The Branch Architecture Model

Let us consider a portfolio with N observed policies during T periods of time and also assume that the insurance company gathers information on the number of claims related to several types of coverages. Example of these coverages may include windscreens, fire and theft, etc. Therefore, the insurer collects information about these coverages, as well as the total number of claims for each policyholder given by the sum of the claims in all different coverages. For the ith policyholder, we consider the multivariate random variable expressed as the following sequence, N j i = ( N j i 1 , N j i 2 , , N j i T ) of claims numbers for coverage j, with j = 1 , 2 , , J , assuming that one of them, i.e., the first one, is the total number of claims, which includes the sum of the claims for all types of coverages purchased by this policyholder.
Furthermore, we assume that N 1 i , the total number of claims recorded in the auto insurance portfolio, follows a Poisson distribution with mean Θ 1 i > 0 for i = 1 , 2 , , N , where N is the total number of policyholders. Now, let us suppose that the policyholders have purchased some of the types of coverages, such as windscreen protection, fire and theft, parking, etc. That is, once the policyholder has made a claim, this can be of any of these types. Let us denote by Z 1 i κ , κ = 1 , , N 1 i , a random variable associated with the number of claims corresponding to the first type of coverage and policyholder i, resulting from the κ th claim of the total claims reported by the ith policyholder assumed to be independent and identically distributed, following also a Poisson distribution with mean Θ 2 i > 0 . Then, the conditional distribution of N 2 i given N 1 i = n 1 i , N 2 i = κ = 1 N 1 i Z 1 i κ , the total number of claims of this first coverage, among the N 1 i total claims is a Poisson distribution with parameter n 1 i Θ 2 i and the joint distribution of ( N 1 i , N 2 i ) has a probability function given by,
f ( n 1 i , n 2 i | Θ 1 i , Θ 2 i ) = Pr ( n 2 i | n 1 i , Θ 2 i ) Pr ( n 1 i | Θ 1 i ) = 1 n 1 i ! n 2 i ! Θ 1 i n 1 i ( Θ 2 i n 1 i ) n 2 i exp [ ( Θ 1 i + Θ 2 i n 1 i ) ] ,
for n 1 i , n 2 i = 0 , 1 , , and i = 1 , 2 , , N and with the convention that 0 j = 0 for j = 0 and 1 otherwise. This bivariate distribution appears in Leiter and Hamdan (1973) (see also Cacaoullos and Papageorgiou 1980; Johnson et al. 1996, chp. 37, p. 136 in the context of accident analysis)
Let Z 2 i κ , κ = 1 , , N 1 i now be a random variable associated with the total number of claims corresponding to the second type of coverage and policyholder i, resulting from the κ th claim of the total claims reported by the ith policyholder assumed to be independent and identically distributed Poisson distribution with mean Θ 3 i > 0 and conditionally independent of N 2 i . Then, the conditional distribution of N 3 i given N 1 i = n 1 i , N 3 i = κ = 1 N 1 i Z 2 i κ , the total number of claims of this second coverage, among the N 1 i total claims is a Poisson distribution with parameter n 1 i Θ 3 i , and now, the joint distribution of ( N 1 i , N 2 i , N 3 i ) has a probability function given by,
f ( n 1 i , n 2 i , n 3 i | Θ 1 i , Θ 2 i , Θ 3 i ) = Pr ( n 1 i , n 2 i ) Pr ( n 3 i | ( n 1 i , n 2 i ) ) = Pr ( n 1 i , n 2 i ) Pr ( n 3 i | n 1 i ) = Θ 1 i n 1 i n 1 i ! j = 2 3 Θ j i n 1 i n j i n j i ! exp Θ 1 i + n 1 i j = 2 3 Θ j i ,
where the hypotheses of conditional independence between the two types of coverages were assumed.
Following the same argument, it is easy to see that if we have J types of coverages, then the joint probability function of ( N 1 i , N 2 i , , N J i ) N J is given by:
f ( n 1 i , , n J i | Θ i ) = Θ 1 i n 1 i n 1 i ! j = 2 J Θ j i n 1 i n j i n j i ! exp Θ 1 i + n 1 i j = 2 J Θ j i ,
where Θ i = ( Θ 1 i , , Θ J i ) . For this multivariate distribution, it is allowed that n 1 i takes larger or smaller values than n j i ; however, in the proposed model, it is verified that n 1 i is larger than or equal to n J i for all J > 1 . In this case, it is obvious that Θ 1 must be larger than Θ j , j = 2 , , J . The latter statement is confirmed in the numerical application section.
This distribution is a multivariate extension of the bivariate one proposed in Leiter and Hamdan (1973) (see also Cacaoullos and Papageorgiou 1980).
The ordinary probability-generating function of ( N 1 , , N J ) with the probability mass function (pmf) given in (1) is given by:
G N 1 , , N J ( z 1 , , z J ) = exp Θ 1 z 1 exp j = 2 J Θ j ( z j 1 ) 1 ,
for | z j | 1 , j = 1 , , J .
From here, it is easy to see that the marginal distribution of N 1 i is Poisson with parameter Θ 1 i , while N j i , j = 1 , , J have a Neyman Type A distribution with parameters Θ 1 i and Θ j i . Recall that the probability function of the Neyman Type A distribution (see Neyman 1939; Douglas 1955; Kemp 1967; Johnson et al. 2005, chp. 8, among others) is given by:
f ( n j i ) = Θ j i n j i exp Θ 1 i n j i ! n 1 i = 0 Θ 1 i exp ( Θ j i ) n 1 i n 1 i n j i n 1 i ! , n j i = 0 , 1 , ,
for j = 1 , , J .
Some computations provide the marginal and cross-moments, which are given by:
E ( N 1 i | Θ 1 i ) = Θ 1 i ,
E ( N j i | Θ 1 i , Θ j i ) = Θ 1 i Θ j i , j = 2 , , J ,
E ( N 1 i N j i | Θ 1 i , Θ j i ) = Θ 1 i ( 1 + Θ 1 i ) Θ j i , j = 2 , , J , E ( N j i N l i | Θ 1 i , Θ j i ) = Θ 1 i ( 1 + Θ 1 i ) Θ j i Θ l i , j , l = 2 , , J , j l ,
from which it is simple to see that:
c o v ( N 1 i , N j i ) = Θ 1 i Θ j i , j = 2 , , J c o v ( N j i , N l i ) = Θ 1 i Θ j i Θ l i , j , l = 2 , , J , j l ,
and therefore, the model admits only a positive correlation between pairs of random variables. The marginal variances are given by:
v a r ( N 1 i | Θ 1 i ) = Θ 1 i ,
v a r ( N j i | Θ 1 i , Θ j i ) = Θ 1 i Θ j i 1 + Θ j i , j = 2 , , J .
Observe that, using (5) and (6) together with (3) and (4), the model is equidispersed (variance equal to the mean) for N 1 i and overdispersed (variance larger than the mean) for the rest of the coverages.
Finally, the correlation can easily be computed as:
ρ ( N 1 i , N j i ) = Θ j i 1 + Θ j i , j = 2 , , J ,
ρ ( N j i , N l i ) = Θ j i Θ l i ( 1 + Θ j i ) ( 1 + Θ l i ) , j , l = 2 , , J , j l .
One can be interested also in the distribution of N 1 i given N j i = n j i . The probability-generating function of this conditional distribution is given by:
G N 1 i | N j i = n j i ( z ) = exp Λ j ( z 1 ) B n j i ( Λ j z ) B n j i ( z ) , | z | 1 ,
where Λ j = Θ 1 i exp Θ j i and B n ( τ ) are the Bell numbers given by:
B n ( τ ) = k = 0 n S ( n , k ) τ k ,
with:
S ( n , k ) = 1 k ! i = 0 k ( 1 ) i k i ( k i ) n
being the Stirling number of the second kind.1
Now, the conditional mean of N 1 i given by N j i = n j i can be written as:
E N 1 i | N j i = n j i = B n j i + 1 ( Λ ) B n j i ( Λ ) .

2.1. Some Results in Risk Theory

Observe that due to the model construction, we have that j = 2 J Θ j i = 1 , i.e., every claim in coverage j is a proportion of the total claims N 1 i . Then, if the actuary decides to use the net premium principle, i.e., P ( X ) = E ( X ) , to compute the premium, then for the ith policyholder and coverage s with s { 2 , , J } , the premium results P s i = Θ 1 i Θ s i = P 1 i Θ s i , where P 1 i = Θ 1 i is the net premium for the total coverage, that is the sum of the premiums in each of the coverages purchased. A similar result is obtained by using the expected value principle. A catalog of premium principles can be found in Young (2006).
Let us now consider that the actuary decides to use the variance premium principle, i.e., P ( X ) = E ( X ) + v a r ( X ) / E ( X ) with E ( X ) > 0 , to calculate the premium. Then, in this case, we obtain that P 1 i = 1 + Θ 1 i and P j i = 1 + Θ j i P 1 i . However, in this case, we have that j = 2 J P j i = J 1 + P 1 i , which is different from P 1 i , except for the case in which J = 1 and no coverages exist.
However, a model solely based on the number of claims is not realistic. In risk theory, it is common to incorporate the amount associated with each of the claims to build the compound model. That is, the property and/or casualty ratemaking are generally based on a claim frequency distribution and a loss distribution. Due to the complex derivation of this multivariate compound model, the subscript i is removed from the text in the remainder of this section. For this purpose, let us now assume that Y j = i = 1 N j E i , j = 1 , , J , where E i is the random variable denoting the size or amount of the ith claim, following an exponential distribution with probability density function (pdf) h ( y j ) = σ 1 exp ( y j / σ ) . Furthermore, we assume that E 1 , E 2 , , are independent and identically distributed random variables and also independent of the number of claims N j . It is well known (see for example Rolski et al. 1999) that Y j follows a piecewise distribution with pdf given by g ( y j ) = n j = 1 f N j ( n j ) h * n j ( y j ) , y j > 0 , and g ( 0 ) = f N j ( 0 ) . Then, by following the methodology given in Lee and Lin (2012), we have that Y = ( Y 1 , , Y J ) follows a multivariate Erlang distribution with scale parameter σ > 0 and shape parameter n j > 0 , j = 1 , 2 , , J . Their marginal distributions are a univariate Erlang mixture.
Then, simple computations provide,
g ( y 1 ) = Θ 1 y 1 σ exp Θ 1 + y 1 σ I 1 2 Θ 1 y 1 σ , y 1 > 0 , exp ( Θ 1 ) , y 1 = 0 .
Here, I 1 ( · ) represents the modified Bessel function of the first kind, which admits the following series representation,
I ν ( z ) = k = 0 1 Γ ( k + ν + 1 ) k ! z 2 2 k + ν .
The distribution for the coverages can be computed by using (2) in the following way,
g ( y j ) = n j = 1 Θ j n j exp Θ 1 n j ! n 1 = 0 Θ 1 exp ( Θ j ) n 1 n 1 n j n 1 ! y j n j 1 exp ( y j / σ ) σ n j Γ ( n j ) .
Now, taking into account that:
n j = 1 ( Θ j n 1 y j / σ ) n j n j ! Γ ( n j ) = I 1 2 n 1 y j Θ j / σ
we finally obtain the aggregate claim size pdf for the different coverages given by,
g ( y j ) = Θ j y j σ exp Θ 1 + y j σ n 1 = 0 n 1 Λ j n 1 n 1 ! I 1 2 Θ j n 1 y j σ , y j > 0 , exp Θ 1 + Λ j , y j = 0 ,
for j = 2 , , J . Thus, they are also given as a piecewise distribution. For practical purposes, the infinite sum that appears in this expression can be replaced by a finite sum from one to k, where k can take values around one-hundred. From the assumption of the independence between the number of claims and the claims size, we have that:
E ( Y 1 ) = σ Θ 1 , E ( Y j ) = σ Θ 1 Θ j , j = 2 , , J ,
which can be considered as the net premium when both the number and size are considered at the same time.
Finally, we have that:
v a r ( Y 1 ) = 2 σ 2 Θ 1 , v a r ( Y j ) = σ 2 Θ 1 Θ j ( 2 + Θ j ) , j = 2 , , J ,
while the covariance (see Lee and Lin 2012) is given by:
c o v ( Y j , Y l ) = σ 2 c o v ( N j , N l ) , j l .

2.2. Multivariate Zero-Inflated Model

In many automobile insurance portfolios, the claims are rarely observed as compared to the no-claims situation. Univariate and bivariate zero-inflated models have been introduced in the statistical literature in many fields. In the setting of auto insurance, we refer to Boucher et al. (2007) and Frees et al. (2016) for the univariate case and Bermúdez (2009) and Bermúdez and Karlis (2017) for the bivariate case. Multivariate ones are scarce in the general statistical literature. References in the statistical literature are Li et al. (1999) and Liu and Tian (2015). In the actuarial literature, there are no references of models of this nature that go beyond the two variables. However, multivariate zero-truncated models were considered in Zhang et al. (2020).
A multivariate zero-inflated model can be constructed as a mixture of the multivariate distribution given in (1) and a point mass at ( 0 , , 0 ) R J in the following way,
g ( n 1 i , , n J i | Θ i ) = 1 Φ + Φ f ( 0 , , 0 | Θ i ) , ( n 1 i , , n J i ) = ( 0 , , 0 ) , Φ f ( n 1 i , n J i | Θ i ) , ( n 1 i , , n J i ) ( 0 , , 0 ) ,
where 0 Φ 1 is an inflation parameter. Obviously, this model reduces to (1) for Φ = 1 . Under this model, the marginal means and cross-moments are given by:
E ( N 1 i | Θ 1 i ) = Φ Θ 1 i , E ( N j i | Θ 1 i , Θ j i ) = Φ Θ 1 i Θ j i , j = 2 , , J , E ( N 1 i N j i | Θ 1 i , Θ j i ) = Φ Θ 1 i ( 1 + Θ 1 i ) Θ j i , j = 2 , , J , E ( N j i N l i | Θ 1 i , Θ j i ) = Φ Θ 1 i ( 1 + Θ 1 i ) Θ j i Θ l i , j , l = 2 , , J , j l ,
from which the covariance between pairs of marginal random variables can be obtained. They are given by:
c o v ( N 1 i , N j i ) = Φ Θ 1 i Θ j i 1 + Φ ¯ Θ 1 i , j = 2 , , J c o v ( N j i , N l i ) = Φ Θ 1 i Θ j i Θ l i 1 + Φ ¯ Θ 1 i , j , l = 2 , , J , j l ,
where Φ ¯ = 1 Φ .
Again, if the actuary computes the premium by using the net premium principle, then for each coverage, the premiums are not affected by the inflation parameter Φ . A complete model would allow inflating each coverage with inflation parameters Φ j ; however, they are not included in this work due to the computational cost of estimating a large number of parameters.
The marginal variances are given by:
v a r ( N 1 i | Θ 1 i ) = Φ Θ 1 i 1 + Φ ¯ Θ 1 i , v a r ( N j i | Θ 1 i , Θ j i ) = Φ Θ 1 i Θ j i 1 + Θ j i 1 + Φ ¯ Θ 1 i , j = 2 , , J .
Finally, the correlations are:
ρ ( N 1 i , N j i ) = Θ j i 1 + Φ ¯ Θ 1 i 1 + Θ j i 1 + Φ ¯ Θ 1 i , j = 2 , , J ,
ρ ( N j i , N l i ) = Θ j i Θ l i 1 + Φ ¯ Θ 1 i 2 1 + Θ j i 1 + Φ ¯ Θ 1 i 1 + Θ l i 1 + Φ ¯ Θ 1 i ,
for j , l = 2 , , J , j l .

2.3. A regression Model

For the sake of convenience, the model (1) can be rewritten in a different way to facilitate the implementation of covariates to determine which factors and explanatory variables have an influence on the mean of the corresponding coverage. Then, by equating Θ 1 i to μ 1 i and Θ j i to μ j i / μ 1 i , j = 2 , , J , μ 1 i 0 , we obtain the normalized joint distribution, which can be expressed as,
f ( n 1 i , , n J i ) = μ 1 i n 1 i j = 2 J 1 n j i ! n 1 i μ j i μ 1 i n j i exp μ 1 i + n 1 i μ 1 i j = 2 J μ j i ,
for n 1 i = 0 , 1 , , n j i = 0 , 1 , , n 1 i , j = 2 , , J .
The probability function (12) satisfies the condition that the marginal means are given by E ( N j i ) = μ j i , j = 1 , 2 , , J , assuming that μ j i 0 , for all j. Thus, it is suitable for including covariates. Then, to carry out this regression model, we suppose that the observed counts ( N 1 i , , N J i ) have independent distributions given by (12) with E ( N j i ) = μ j i , j = 1 , 2 , , J . Now, it is assumed that a set of observable covariates useful to subdivide the portfolio into classes of risks with homogeneous characteristics are included in the linear predictor, η j i . To guarantee a positive expected value of the response variables, it is reasonable and common to use a logarithmic link for this function and therefore express the mean as:
η j i = log μ j i = x ̲ j i γ ̲ j , i = 1 , 2 , , N , j = 1 , 2 , , J ,
where x ̲ j i = ( x j i 1 , , x j i m ) is a vector of m covariates for the ith observation μ j i and γ ̲ j = ( γ j 1 , , γ j m ) denotes the corresponding vector of regression coefficients to be estimated, which usually includes a constant term. Without loss of generality, it is assumed that for each j = 1 , , J , N j i is related to the same set of covariates. In addition, one of the covariates may be identified as an exposure term to calibrate the size of a potential outcome variable by assuming that the mean varies proportionally with the exposure e j i (see Frees 2010; Frees et al. 2016),
μ j i = e j i exp { x ̲ j i γ ̲ j } , i = 1 , 2 , , N , j = 1 , 2 , , J .
Similarly, the covariates can be implemented in the multivariate zero-inflated regression model by simply regressing the mean value of the different coverages. It should be pointed out, although it is not considered here, that it could also be assumed that the inflated parameter Φ could depend on certain regressors. This issue seems not to be possible here, and thus, it could be a subject that merits further investigation in future research.

3. Estimation of the Parameters

In this section, we firstly describe the methodology for the maximum likelihood estimation and derivation of the entries of Fisher’s information matrix for the basic model. Next, the same development is illustrated for the associated regression model. Finally, the expression of the log-likelihood function, score equations, and the second derivative of the log-likelihood function with respect to the parameters for the zero-inflated model are exhibited.
In general, the statistical inference for multivariate models is not trivial, and the computational procedure is often expensive (see, for instance, Selch and Scherer 2010). Nevertheless, the estimation procedure for the model proposed here is straightforward. To see this, we first consider the case without covariates. Let us assume that a sample n ˜ : = { ( n ˜ 11 , , n ˜ J 1 ) , , ( n ˜ 1 n , , n ˜ J n ) } that includes n independent observations in each one of the J types of coverage is collected. The log-likelihood function is proportional to:
( Θ 1 , , Θ J ; n ˜ ) i = 1 n n ˜ 1 i log Θ 1 + i = 1 n j = 2 J n ˜ j i log Θ j i = 1 n Θ 1 Θ ˜ i = 1 n n ˜ 1 i ,
where Θ ˜ = j = 2 J Θ j . After differentiating the latter expression, it is possible to obtain in closed-form the maximum likelihood estimators of the parameters. They are given by:
Θ ^ 1 = n ¯ 1 , Θ ^ j = n ¯ j n ¯ 1 , j = 2 , , J ,
where n ¯ k , k = 1 , , J , represents the sample mean, i.e., i = 1 n n ˜ 1 k / n .
The elements that provide the entries of Fisher’s information matrix are as follows:
E 2 ( Θ 1 , , Θ J ; n ˜ ) Θ 1 2 = n Θ 1 , E 2 ( Θ 1 , , Θ J ; n ˜ ) Θ j 2 = n Θ 1 Θ j , j = 2 , , J , E 2 ( Θ 1 , , Θ J ; n ˜ ) Θ l Θ k = 0 , l , k = 1 , , J , l k .
For the regression model, the log-likelihood function contains J × m parameters, and it is proportional to:
( γ ̲ 1 , , γ ̲ J ; n ˜ ) i = 1 n n ˜ 1 i log μ 1 i + i = 1 n j = 2 J n ˜ j i log μ j i i = 1 n j = 2 J n ˜ j i log μ 1 i i = 1 n μ 1 i i = 1 n n ˜ 1 i μ 1 i j = 2 J μ j i ,
where μ j i = exp { x ̲ j i γ ̲ j } , i = 1 , 2 , , n and j = 1 , , J .
The score equations are given by:
( γ ̲ 1 , , γ ̲ J ; n ˜ ) γ 1 k = i = 1 n n ˜ 1 i μ 1 i 1 x i k + i = 1 n j = 2 J n ˜ 1 i μ j i μ 1 i 2 n ˜ j i μ 1 i x i k ( γ ̲ 1 , , γ ̲ J ; n ˜ ) γ j k = i = 1 n n ˜ j i μ j i n ˜ 1 i μ 1 i x i k ,
with j = 2 , , J and k = 1 , , m .
Fisher’s information matrix is made up of four blocks, as can be seen below:
E 2 γ 1 k γ 1 l 2 γ 1 k γ 2 l 2 γ 1 k γ J l 2 γ 2 l γ 1 k 2 γ 2 k γ 2 l 0 2 γ J l γ 1 k 0 2 γ J k γ J l
where:
E 2 γ 1 k γ 1 l = i = 1 n 1 μ 1 i + j = 2 J μ j i μ 1 i 2 x i k x i l , E 2 γ 1 k γ j l = i = 1 n 1 μ 1 i x i k x i l , E 2 γ j k γ j l = i = 1 n 1 μ j i x i k x i l , E 2 γ j k γ h l = 0 , with j h ,
where j , h = 2 , , J ; l , k = 1 , , m , and 0 is the zero matrix with dimension m × m .
For the zero-inflated model, the log-likelihood is proportional to:
( Θ 1 , , Θ J , Φ ; n ˜ ) n * log ( 1 Φ + Φ exp ( Θ 1 ) ) + ( n n * ) log Φ Θ 1 + log Θ 1 i = n * + 1 n n 1 i Θ ˜ i = n * + 1 n n 1 i + i = n * + 1 n j = 2 J n j i log ( n 1 i Θ j ) ,
where n * is the number of zeroes of the random variable N 1 i . The normal equations that provide the maximum likelihood estimates are given by,
( Θ 1 , , Θ J , Φ ; n ˜ ) Φ = n * ( exp ( Θ 1 ) 1 ) 1 Φ + Φ exp ( Θ 1 ) + n n * Φ = 0 ,
( Θ 1 , , Θ J , Φ ; n ˜ ) Θ 1 = n * Φ exp ( Θ 1 ) 1 Φ + Φ exp ( Θ 1 ) ( n n * ) + 1 Θ 1 i = n * + 1 n n 1 i = 0 ,
( Θ 1 , , Θ J , Φ ; n ˜ ) Θ j = i = n * + 1 n n 1 i + 1 Θ j i = n * + 1 n n j i = 0 , j = 2 , , J .
From (13), we obtain that Φ = ( n * n ) / ( n ( exp ( Θ 1 ) 1 ) ) , which can be carried out to (15) to obtain the estimator of Θ 1 , say Θ ^ 1 . This value is carried out now to (13) to obtain the estimator of the inflated parameter, Φ . Finally, from (15), the estimator of Θ j , j = 2 , , J is obtained in the closed-form expression given by Θ ^ j = i = n * + 1 n n j i / i = n * + 1 n n 1 i .
The second partial derivatives are as follows,
2 ( Θ 1 , , Θ J , Φ ; n ˜ ) Φ 2 = n * ( exp ( Θ 1 ) 1 ) 2 ( 1 Φ + Φ exp ( Θ 1 ) ) 2 n n * Φ 2 , 2 ( Θ 1 , , Θ J , Φ ; n ˜ ) Φ Θ 1 = n * exp ( Θ 1 ) ( 1 Φ + Φ exp ( Θ 1 ) ) 2 , 2 ( Θ 1 , , Θ J , Φ ; n ˜ ) Θ 1 2 = n * Φ ( 1 Φ ) exp ( Θ 1 ) ( 1 Φ + Φ exp ( Θ 1 ) ) 2 1 Θ 1 2 i = n * + 1 n n 1 i , 2 ( Θ 1 , , Θ J , Φ ; n ˜ ) Θ j 2 = 1 Θ j 2 i = n * + 1 n n j i , j = 2 , , J , 2 ( Θ 1 , , Θ J , Φ ; n ˜ ) Φ Θ j = 0 , j = 2 , , J , 2 ( Θ 1 , , Θ J , Φ ; n ˜ ) Θ 1 Θ j = 0 , j = 2 , , J ,
Observe that the analytic expressions of i = n * + 1 n n 1 i and i = n * + 1 n n j i are not feasible. For computational reasons, for large values of n, this is evaluated by ignoring the expectation operator and replacing it by i = n * + 1 n n 1 i and i = n * + 1 n n j i . The asymptotic variance–covariance matrix is approximated by inverting the observed information matrix.
When covariates are introduced under the inflated model, we proceed first by replacing in (9) the pmf f by its corresponding (12), where again, μ j i = exp { x ̲ j i γ ̲ j } , i = 1 , 2 , , n , and j = 1 , , J . In practice, as shown in the numerical applications below, the parameter estimation and computation of standard errors were carried out by the method of maximum likelihood using Mathematica ® v.12.0. We directly maximized the log-likelihood function by using different maximum search methods available in the FindMaximum built-in function in the Mathematicasoftware package. This software package also provides at least two methods of obtaining the elements of the Hessian matrix. The first one consists of retrieving them from the Cholesky factors (this package is available on the web upon request). The second one, which is faster, derives them by finite differentiation. Results were also confirmed with WinRATS v.7.0.

4. Numerical Application

For our empirical analysis, we used the French Motor Personal Line datasets available in the package “CASdatasets“ in R. This is a collection of ten datasets that comes from a private motor French insurer. Each dataset includes risk features such as claim amount, risk area, gender of the policyholder, number of claims for different coverages, etc. In particular, we chose the freMPL10 dataset, which includes 32,100 policies for the year 2004. In our study, we considered six response variables, which are shown in Table 1. Note that the dependent variable Claims for each policyholder comprises the sum of the individual claims in all other variables. The details of the joint claims frequency for all types of coverage and the total number of claims are illustrated in Appendix A (Table A1). Note that the maximum number of claims reported by an insured is six. The number of policyholders who did not report a claim is 12,257 (38.18%), and the number of customers that only declared a claim in any of the coverages is 10,803 (33.65%).
Together with all the responses, this dataset includes a set of explanatory variables. Table 2 below describes the factors and explanatory variables used in the investigation. We also considered an offset variable when modeling the claims frequency, exposure, the time exposed to risk during the investigation period.
In Table 3, the parameter estimates and their corresponding p-values are provided for the basic and zero-inflated models without covariates. Some measures of model selection are also provided in the bottom part of this table. For comparisons purposes, we used the multivariate negative binomial distribution (MNB) provided in (Johnson et al. 1996, chp. 36, p. 94) with the pmf given by:
Pr ( n 1 , , n k ) = Γ ( α + i = 1 k n i ) Γ ( α ) i = 1 k n i ! Θ 1 α i = 2 k Θ i n i ,
where α > 0 , 0 < Θ i < 1 , i = 1 , 2 , , k , and i = 1 k Θ i = 1 . As can be seen in Table 3, the multivariate Poisson distribution studied in this paper has a better performance than the MNB for this dataset. Furthermore, it is observable that the the zero-inflated model improves the basic one due to the high frequency of zeros. On the other hand, we also tried to fit the two multivariate Poisson distributions provided in Bermúdez and Karlis (2011); however, we were unable to derive the maximum likelihood estimates of this model for this dataset.
Table 4 below exhibits the empirical Pearson’s correlation between the different frequencies associated with each response variable for the total number of claims, and each one of the different coverages (first row), the correlation derived computed via the basic model (second row) and zero-inflated model (third row), and that computed by using (7) and (10), respectively, are also shown. It is observable that there exists a weak positive correlation between Claims and the rest of the dependent variables for each one of the coverages, and the empirical values are near the theoretical values. These figures were calculated before incorporating the effect of the explanatory variables for the different coverages. We also calculated the correlation coefficient between the rest of the response variables. Again, there is a weak positive correlation, ranging from 0.0480 between Responsible and Nonresponsible and 0.0035 between Parking and Windscreen.
Empirical marginal distributions and fitted marginals under the basic model (Fit 1) and zero-inflated model (Fit 2) are illustrated below in Table 5 using the estimates computed in the previous section. Note that the total number of observations equals 32,100.
We fit the multivariate regression model in (12) and the zero-inflated regression model described in the second section. Parameter estimates and their corresponding p-values are displayed in Table 6 and Table 7, respectively. It is observable that for the former regression model, the explanatory variables private 1 and risk area are statistically significant at the 5% significance level. This is also verified by the intercept term of the model, i.e., constant. Furthermore, some other covariates (private 2, profession and has km limit) are significant at the same level for all the responses except for Fire and Theft; similarly, driver age is not significant for the dependent variable Responsible. On the other hand, with respect to the zero-inflated regression model, the explanatory variables risk area and constant are statistically significant at the same significance level for all responses; moreover, the covariate private 1 is not significant for the response variable Parking and has km limit for Fire and Theft. In terms of the four measures of model selection considered, the zero-inflated regression model is preferable over the model (12).
Now, we are interested in comparing the six mixed random variables’ aggregate claims amount for Claims ( Y 1 ) and the different coverages, i.e., Nonresponsible, Responsible, Parking, Windscreen, and Fire and Theft, ( Y 2 , , Y 6 ) . In order to estimate the scale parameter σ of the exponential distribution, we considered the variable ClaimAmount available in the dataset freMPL10. The estimate of this parameter is σ ^ = 1.96265 . The pdf/pmf associated with the mixed random variables is displayed in Figure 1. As expected, the density of the random variable Claims fades away to zero slower than the random variables of the different coverages. Among the different coverages, the Responsible variable is the one that approaches zero faster compared to the other coverages.

5. Final Comments and Future Research

It is common today, mainly due to the existing competition, that insurers offer coverage of different claims within the same product not only to gain in competitiveness, but also to benefit from risk diversification and volatility. Up to date, most insurance companies differentiate, apart from the total number of claims, individualized claims for different coverages, such as windscreen claims and thefts and fire claims, among others. Therefore, it seems reasonable to assume that every policyholder generates a sequence of claims numbers for each coverage; one of them is the total claims number, which includes the sum of the claims in all the coverages. In this work, we introduced a new methodology based on a multivariate discrete distribution via conditional specification to explain the claims frequency in different coverages and the total claims number. This procedure allows us to analyze the dependence structure of a finite number of coverages in motor vehicle insurance and also to include heterogeneity via explanatory variables. Closed-form expressions were given for model parameter estimates, and a priori premiums were provided when different premiums principles were used. Numerical applications revealed that specific covariates are statistically significant in some coverages, yet they are not so for others. In this way, it allows us to discern how the different explanatory variables affect each coverage when calculating the corresponding premiums.
The approach introduced in this work avoids the use of copula-based modeling. The latter methodology has been very useful, but at the same time, very criticized in the statistical literature when modeling multivariate data. Although there exists a wide catalog of copulas, it has been mentioned that a weakness of the copula approach is in choosing an appropriate copula structure for the model at hand (Balakrishnan and Lai 2009, chp. 1, p. 59). Furthermore, any copula includes a parameter that controls the dependence structure, and this parameter is sometimes difficult to estimate since it must fall into the admissible support. As explored in the second section of this work, the model depends extremely on the parameter Θ 1 , and for that reason, a more flexible dependence structure based on multivariate subordination is an issue that deserves to be studied. In this regard, using this approach would be interesting to compare this family of distributions with the multivariate regression model based on the multivariate Sarmanov distribution, similar to the models derived in Bolancé and Vernic (2019). This model could be used to explain situations where the policyholder wishes to extend the third-party motor vehicle insurance to account for different coverages that adapt to their personal needs. Alternatively, it could be feasible to implement a multivariate version with elliptical copula-based models to accommodate a wide range of dependence. It is essential to mention that the properties of the copula are not the same as for continuous random variables since the probability of ties in the data is positive. Thus, the estimation cannot be directly carried out, and a continuous extension of integer-valued random variables is needed by using the approach proposed by Denuit and Lamber (2005).
The purpose of the work is not to compare other models, as models of this nature are not known to our knowledge in the actuarial literature. However, the cases with two coverages were discussed via the bivariate Poisson case (see Bermúdez and Karlis 2017) and the case with all the coverages using the multivariate negative binomial distribution in (Johnson et al. 1996, chp. 36, p. 94). Obviously, the fit obtained with the proposed modeling does not seem entirely reasonable (as judged by the chi-squared test statistics, which was not shown in the paper). Then, the model could be improved by using a similar model, but assuming that the total number of claims and all the coverages follow a negative binomial distribution instead. It would be also possible to zero-inflate all the different coverages. This issue could be the subject of future research.

Author Contributions

Conceptualization, E.G.-D. and E.C.-O.; methodology, E.G.-D. and E.C.-O.; software, E.G.-D. and E.C.-O.; validation, E.G.-D. and E.C.-O.; formal analysis, E.G.-D. and E.C.-O.; investigation, E.G.-D. and E.C.-O.; resources, E.G.-D. and E.C.-O.; data curation, E.G.-D. and E.C.-O.; writing—original draft preparation, E.G.-D. and E.C.-O.; writing—review and editing, E.G.-D. and E.C.-O.; visualization, E.G.-D. and E.C.-O.; supervision, E.G.-D. and E.C.-O.; project administration, E.G.-D. and E.C.-O. All authors have read and agreed to the published version of the manuscript.

Funding

E.G.-D.’s work was partially funded by Grant ECO2017–85577–P (Ministerio de Economía, Industria y Competitividad. Agencia Estatal de Investigación).

Acknowledgments

The authors are grateful to the four anonymous referees for their constructive comments and suggestions, which greatly helped us improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1 displays the empirical joint claims frequency for all types of coverage and the total number of claims.
Table A1. Empirical joint claims frequency corresponding to the response variables Claims ( N 1 ), Nonresponsible ( N 2 ), Responsible ( N 3 ), Parking ( N 4 ), Windscreen ( N 5 ), and Fire and Theft ( N 6 ).
Table A1. Empirical joint claims frequency corresponding to the response variables Claims ( N 1 ), Nonresponsible ( N 2 ), Responsible ( N 3 ), Parking ( N 4 ), Windscreen ( N 5 ), and Fire and Theft ( N 6 ).
N 1 N 2 N 3 N 4 N 5 N 6 Count N 1 N 2 N 3 N 4 N 5 N 6 Count N 1 N 2 N 3 N 4 N 5 N 6 Count N 1 N 2 N 3 N 4 N 5 N 6 Count N 1 N 2 N 3 N 4 N 5 N 6 Count
00000012,2573200101474211001352210046320102
100001447320100244220001952300076410102
10001039473210001374300101153002037050201
10010059033000047430100253011027060102
101000292440002244310001253100147101501
11000028954000314440000753101047110502
2000022840004036500041453110047140201
2000111934001307500050753200037202301
2000208174003102500122254001027320202
200101344010123500131154100047322001
20011023840102115500212260006018025101
2002003340103060500221160104128140302
2010011504011021500320460105018330202
201010103840111135010311601140210130602
2011001804011202050104066020311202000436
4012013501130460204022100011384012109
5012022602121221001095440200225014001
60213012101001514020111350200316030302
2110007834020204850201216031203220000398
402101150202146040021300003240211010
502030860402023000121940220045021201
604200130002155403001350230036050102
300030136403010175030111261005023001022
403100250302056110312300111114040004
5031101061104063001204541000325040102
61113013002011410012151003146120122
3002107410021205100401361203083003004
410030505101302612120130100284101112
5110121612210130101145410120165110217
61230023010202684102011511030236130204
3011011041021075111111614100230111049
41101122511120362003113012001241102078
512002262004013020012341110155120112
620211130201021141111012512020206210301
30210020411200105121102621120130300074
412001751300126220111310002741201062
5130107622020431001152412100165131007
62211023100202574130002051400026230103
310101942000255200211623100131011059
420011552003066240002310200942002035
5210119630021331100140420101152102013
63003013110102844201102552111036301202
31110061420200252120026310202312000144
4210018522001263111023200011742101047
522010216320011

Note

1
The Stirling number of the second kind can be computed in Mathematica with the command StirlingS2 [ n , τ ] (see, for instance, Ruskeepaa 2009; Olver et al. 2010).

References

  1. Balakrishnan, Narayanaswamy, and Chin-Diew Lai. 2009. Continuous Bivariate Distributions, 2nd ed. New York and London: Springer. [Google Scholar]
  2. Bermúdez, Lluis. 2009. A priori ratemaking using bivariate Poisson regression models. Insurance: Mathematics and Economics 44: 135–41. [Google Scholar]
  3. Bermúdez, Lluis, and Dimitris Karlis. 2011. Bayesian multivariate Poisson models for insurance ratemaking. Insurance: Mathematics and Economics 48: 226–36. [Google Scholar] [CrossRef] [Green Version]
  4. Bermúdez, Lluis, and Dimitris Karlis. 2017. A priori ratemaking using bivariate Poisson models. Scandinavian Actuarial Journal 2: 148–58. [Google Scholar] [CrossRef] [Green Version]
  5. Bolancé, Catalina, and Raluca Vernic. 2019. Multivariate count data generalized linear models: Three approaches based on the Sarmanov distribution. Insurance: Mathematics and Economics 85: 89–103. [Google Scholar] [CrossRef] [Green Version]
  6. Boucher, Jean, Michel Denuit, and Montserrat Guillén. 2007. Risk classification for claim counts: A comparative analysis of various zero-inflated mixed Poisson and hurdle models. North American Actuarial Journal 11: 110–31. [Google Scholar] [CrossRef] [Green Version]
  7. Cacaoullos, Teophilos, and Haralambos Papageorgiou. 1980. On some bivariate probability models applicable to traffic accidents and fatalities. International Statistical Review 48: 345–56. [Google Scholar] [CrossRef]
  8. Cameron, Colin, and Pravin Trivedi. 1986. Econometric models based on count data: Comparisons and applications of some estimators and tests. Journal of Applied Econometrics 1: 29–54. [Google Scholar] [CrossRef]
  9. Cameron, Colin, and Pravin Trivedi. 1998. Regression Analysis of Count Data. Cambridge: Cambridge University Press. [Google Scholar]
  10. Cummins, David, and Laurel Wiltbank. 1983. Estimating the total claims distribution using multivariate frequency and severity. The Journal of Risk and Insurance 50: 377–403. [Google Scholar] [CrossRef]
  11. Cummins, David, and Laurel Wiltbank. 1984. A multivariate model of the total claims process. ASTIN Bulletin 14: 45–52. [Google Scholar] [CrossRef] [Green Version]
  12. Denuit, Michel, and Philippe Lamber. 2005. Constraints on concordance measures in bivariate discrete data. Journal of Multivariate Analysis 93: 40–57. [Google Scholar] [CrossRef] [Green Version]
  13. Denuit, Michel, Xavier Marèchal, Sandra Pitrebois, and Jean-Francois Walhin. 2009. Actuarial Modelling of Claim Counts Risk Classification, Credibility and Bonus-Malus Systems. Hoboken: John Wiley & Sons. [Google Scholar]
  14. Douglas, Jim. 1955. Fitting the Neyman Type A (two parameter) contagious distribution. Biometrics 11: 149–173. [Google Scholar] [CrossRef]
  15. Regression Modeling with Actuarial and Financial Applications. Cambridge: Cambridge University Press.
  16. Frees, Edward, Gee Lee, and Lu Yang. 2016. Multivariate frequency-severity regression models in insurance. Risks 4: 4. [Google Scholar] [CrossRef] [Green Version]
  17. Gómez-Déniz, Emilio. 2016. Bivariate credibility bonus-malus premiums distinguishing between two types of claims. Insurance: Mathematics and Economics 70: 117–24. [Google Scholar]
  18. Gómez-Déniz, Emilio, and Enrique Calderín-Ojeda. 2018. Multivariate credibility in bonus–malus systems distinguishing between different types of claims. Risks 6: 34. [Google Scholar] [CrossRef] [Green Version]
  19. Gómez-Déniz, Emilio, and Enrique Calderín-Ojeda. 2020. A survey of the individual claim size and other risk factors using credibility bonus–malus premiums. Risks 8: 20. [Google Scholar]
  20. Johnson, Norman, Adrienne Kemp, and Samuel Kotz. 2005. Univariate Discrete Distributions. Hoboken: John Wiley, Inc. [Google Scholar]
  21. Johnson, Norman, Samuel Kotz, and Narayanaswamy Balakrishnan. 1996. Discrete Multivariate Distributions. Hoboken: John Wiley, Inc. [Google Scholar]
  22. Kemp, Charles. 1967. On a contagious distribution suggested for accident data. Biometrics 23: 241–55. [Google Scholar] [CrossRef]
  23. Lee, Simon, and Sheldon Lin. 2012. Modeling dependent risks with multivariate Erlang mixtures. ASTIN Bulletin 42: 153–80. [Google Scholar]
  24. Leiter, Robert E., and M. A. Hamdan. 1973. Some bivariate probability models applicable to traffic accidents and fatalities. International Statistical Review 41: 87–100. [Google Scholar] [CrossRef]
  25. Li, Chin-Shang, Jye-Chyi Lu, Jinho Park, Kyungmoo Kim, Paul A. Brinkley, and John P. Peterson. 1999. Multivariate zero-inflated Poisson models and their applications. Technometrics 41: 29–38. [Google Scholar] [CrossRef]
  26. Liu, Yin, and Guo-Liang Tian. 2015. Type I multivariate zero-inflated Poisson distribution with applications. Computational Statistics & Data Analysis 83: 200–22. [Google Scholar]
  27. Neyman, Jerzy. 1939. On a new class of “contagious” distributions, applicable in entomology and bacteriology. The Annals of Mathematical Statistics 10: 35–57. [Google Scholar] [CrossRef]
  28. Oh, Rosy, Peng Shi, and Jae Youn Ahn. 2020. Bonus-Malus premiums under the dependent frequency-severity modeling. Scandinavian Actuarial Journal 3: 172–95. [Google Scholar] [CrossRef]
  29. Olver, Frank, Daniel Lozier, Ronald Boisvert, and Charles Clark. 2010. NIST Handbook of Mathematical Functions. Cambridge: Cambridge University, New York, NY. [Google Scholar]
  30. Rolski, Tomasz, Hanspeter Schmidli, Volker Schmidt, and Jozef Teugel. 1999. Stochastic Processes for Insurance and Finance. Hoboken: John Wiley & Sons. [Google Scholar]
  31. Ruskeepaa, Heikki. 2009. Mathematica Navigator. Mathematics, Statistics, and Graphics, 3rd ed. Cambridge: Academic Press. [Google Scholar]
  32. Selch, Daniela, and Matthias Scherer. 2010. A Multivariate Claim Count Model for Applications in Insurance. Berlin: Springer International Publishing. [Google Scholar]
  33. Winkelmann, Rainer. 2003. Econometric Analysis of Count Data. Berlin: Springer Science & Business Media. [Google Scholar]
  34. Young, Virginia. 2006. Premium principles. In Encyclopedia of Actuarial Science. New York: John Wiley & Sons, pp. 1–14. [Google Scholar]
  35. Zhang, Pengcheng, Enrique Calderín-Ojeda, Shuanming Li, and Xueyuan Wu. 2020. On the Type I multivariate zero-truncated hurdle model with applications in health insurance. Insurance: Mathematics and Economics 90: 35–45. [Google Scholar] [CrossRef]
Figure 1. pdf/pmf associated with the mixed random variables of the aggregate claims amount for Claims ( Y 1 ) and the different coverages ( Y 2 , , Y 6 ) obtained from the estimated value of the mean of the claims size, σ ^ = 1.96265 .
Figure 1. pdf/pmf associated with the mixed random variables of the aggregate claims amount for Claims ( Y 1 ) and the different coverages ( Y 2 , , Y 6 ) obtained from the estimated value of the mean of the claims size, σ ^ = 1.96265 .
Risks 09 00137 g001
Table 1. Description of the response variables considered.
Table 1. Description of the response variables considered.
ClaimsTotal number of claims made by the policyholder.
NonresponsibleNumber of nonresponsible claims in the four preceding years.
ResponsibleNumber of responsible claims in the four preceding years.
ParkingNumber of parking claims in the four preceding years.
WindscreenNumber of windscreen claims in the four preceding years.
Fire and theftNumber of fire and theft claims in the four preceding years.
Table 2. Description of factors and explanatory variables considered.
Table 2. Description of factors and explanatory variables considered.
VariableDescription
lic agethe driving license age, in months;
veh agetakes the value 1, if the age of the vehicle is less than or equal to five years, 0 otherwise;
gendertakes the value 1 if male, 0 if female;
statustakes the value 0 if alone, 1 if other;
private 1takes the value 1 if the usage of the vehicle is private, 0 otherwise;
private 2takes the value 1 if the usage of the vehicle is private + trip to office, 0 otherwise;
professionaltakes the value 1 if the usage of the vehicle is professional, 0 otherwise;
driver agethe driver age, in years;
has km limittakes the value 1 if there is a km limit, 0 otherwise;
risk areaUnknown risk area between 1 and 13, possibly ordered;
bonustakes the value 1 if the numerical value for bonus/malus is less than 100, 0 otherwise;
malustakes the value 1 if the numerical value for bonus/malus is larger than 100, 0  otherwise.
Table 3. Parameter estimates and p-values for the basic model (second and third columns) and zero-inflated model (fourth and and fifth columns). Some measures of model selection are also given for comparison purposes.
Table 3. Parameter estimates and p-values for the basic model (second and third columns) and zero-inflated model (fourth and and fifth columns). Some measures of model selection are also given for comparison purposes.
ParameterMNBBasic ModelZero-Inflated Model
Estimatep-ValueEstimatep-ValueEstimatep-Value
α ^ 1.043<0.001
Θ ^ 1 0.329<0.0011.060<0.0011.197<0.001
Θ ^ 2 0.335<0.0010.254<0.0010.254<0.001
Θ ^ 3 0.092<0.0010.274<0.0010.274<0.001
Θ ^ 4 0.085<0.0010.057<0.0010.057<0.001
Θ ^ 5 0.019<0.0010.367<0.0010.367<0.001
Θ ^ 6 0.123<0.0010.047<0.0010.047<0.001
Inflation parameter, Φ ^ 0.886<0.001
max −118,828.250−106,895.00−106,692.00
AIC237,671.00213,803.00213,398.00
BIC237,729.00213,853.00213,457.00
CAIC237,736.00213,859.00213,464.00
Table 4. Correlation between empirical frequencies associated with the total claims number and each response variables (first row) and the correlation derived by means of the basic model (second row) and zero-inflated model (third row).
Table 4. Correlation between empirical frequencies associated with the total claims number and each response variables (first row) and the correlation derived by means of the basic model (second row) and zero-inflated model (third row).
NonresponsibleResponsibleParkingWindscreenFire and Theft
Claims 0.51500.56060.26060.62870.2378
0.44990.46370.23340.51830.2123
0.47330.48740.24790.54280.2258
Table 5. Empirical and fitted homogeneous marginal distributions using the basic model (Fit 1) and zero-inflated model (Fit 2).
Table 5. Empirical and fitted homogeneous marginal distributions using the basic model (Fit 1) and zero-inflated model (Fit 2).
CountClaimsNon ResponsibleResponsible
Obs.Fit 1Fit 2Obs.Fit 1Fit 2Obs.Fit 1Fit 2
012,25711,121.2012,257.0024,69425,310.1025,407.124,34324,898.8025,008.50
110,80311,788.5010,279.5063115283.815125.4264265498.245322.11
255716247.916153.969701222.221254.8911241360.301392.85
322962207.592456.09110235.15255.965183279.81303.63
4794585.01735.1831540.9547.10041952.1359.75
5274124.02176.0506.638.0294939.0310.90
68721.9135.1301.011.2865121.471.87
≥7183.817.0300.160.2200.260.35
CountParkingWindscreenFire and Theft
Obs.Fit 1Fit 2Obs.Fit 1Fit 2Obs.Fit 1Fit 2
030,28730,251.4030,257.8022,30623,173.2023,345.930,59530,567.7030,572.00
116861743.151730.4276076249.065992.8614071460.071451.36
2110100.41106.13517721990.302013.389369.3673.43
3154.825.42327525.77562.7052.743.093
410.210.2572126.06142.3300.090.11
510.010.011328.1933.4600.000.00
600.000.0035.957.4100.000.00
≥700.000.0001.471.9400.000.00
Table 6. Parameter estimates and p-values for the regression model with the density function (12). Some measures of the model selection are also exhibited.
Table 6. Parameter estimates and p-values for the regression model with the density function (12). Some measures of the model selection are also exhibited.
ParameterClaimsNon ResponsibleResponsibleParkingWindscreenFire and Theft
Estimatep-ValueEstimatep-ValueEstimatep-ValueEstimatep-ValueEstimatep-ValueEstimatep-Value
lic age−0.2060.000−0.2380.000−0.2110.000−0.1270.2790.0080.874−0.5650.000
veh age0.0210.0840.0470.093−0.0030.9070.2790.0000.0040.843−0.0340.558
gender0.0730.0000.0060.7960.0150.538−0.0950.0500.1960.0000.0240.643
status−0.0020.857−0.0740.0350.0220.5110.0000.9950.0790.010−0.2990.000
private 1−0.5920.000−0.6150.000−0.5790.000−0.4930.001−0.5840.000−0.4220.036
private 2−0.4300.000−0.5000.000−0.3420.000−0.5170.000−0.4310.000−0.1420.461
professional−0.3800.000−0.3970.000−0.3940.000−0.5560.000−0.3220.000−0.2040.301
driver age0.3460.0001.1030.0000.1280.2630.8060.000−0.4170.0000.6350.006
has km limit−0.3780.000−0.3790.000−0.2890.000−0.3170.000−0.5220.000−0.1100.260
risk area0.0180.0000.0150.0010.0580.0000.0830.000−0.0360.0000.1250.000
bonus−0.7610.000−1.8270.000−0.2040.038−0.1500.535−0.2630.002−0.2450.209
malus0.0560.2300.1020.1900.0670.5710.0780.788−0.1830.096−0.4450.086
constant1.7520.000−1.2050.0000.5230.017−4.3170.0002.1250.000−1.4770.001
max −117,736.00
AIC23,5628.00
BIC23,6281.00
CAIC23,6359.00
Observations32,100
Table 7. Parameter estimates and p-values for the zero-inflated regression model. Some measures of the model selection are also exhibited.
Table 7. Parameter estimates and p-values for the zero-inflated regression model. Some measures of the model selection are also exhibited.
ParameterClaimsNon ResponsibleResponsibleParkingWindscreenFire and Theft
Estimatep-ValueEstimatep-ValueEstimatep-ValueEstimatep-ValueEstimatep-ValueEstimatep-Value
lic age0.0500.0910.3160.000−0.0360.527−0.0490.6630.3320.000−0.5160.000
veh age−0.0110.422−0.0450.1090.0070.7860.1680.003−0.0290.233−0.1050.073
gender0.0820.000−0.0140.5830.0390.119−0.0710.1420.2020.0000.0240.647
status−0.0590.000−0.1770.0000.0350.307−0.0060.9290.0190.534−0.3680.000
private 1−0.1750.000−0.2990.000−0.1800.0310.0480.762−0.1700.024−0.4340.012
private 2−0.0990.021−0.3290.000−0.0860.279−0.0540.730−0.1060.135−0.2560.116
professional−0.0480.275−0.2180.006−0.1170.152−0.0610.701−0.0120.861−0.2650.115
driver age−0.2110.000−0.0350.771−0.4390.0000.5800.009−1.1110.0000.3740.099
has km limit−0.3030.000−0.2910.000−0.1590.001−0.2910.001−0.4260.000−0.0810.422
risk area0.0180.0000.0140.0030.0560.0000.0770.000−0.0350.0000.1260.000
bonus−0.4360.000−1.5100.0000.0130.882−0.5020.0040.1350.1340.7400.004
malus0.1570.0010.3110.000−0.0360.757−0.7430.003−0.0210.8490.0560.859
constant2.0670.000−0.0990.6501.4030.000−3.6710.0002.5130.000−1.3270.006
Inflation parameter, Φ 0.8410.000
max −116,856.00
AIC233,870.00
BIC234,532.00
CAIC234,611.00
Observations32,100
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gómez-Déniz, E.; Calderín-Ojeda, E. A Priori Ratemaking Selection Using Multivariate Regression Models Allowing Different Coverages in Auto Insurance. Risks 2021, 9, 137. https://doi.org/10.3390/risks9070137

AMA Style

Gómez-Déniz E, Calderín-Ojeda E. A Priori Ratemaking Selection Using Multivariate Regression Models Allowing Different Coverages in Auto Insurance. Risks. 2021; 9(7):137. https://doi.org/10.3390/risks9070137

Chicago/Turabian Style

Gómez-Déniz, Emilio, and Enrique Calderín-Ojeda. 2021. "A Priori Ratemaking Selection Using Multivariate Regression Models Allowing Different Coverages in Auto Insurance" Risks 9, no. 7: 137. https://doi.org/10.3390/risks9070137

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop