A hybrid deep learning method for optimal insurance strategies: Algorithms and convergence analysis

https://doi.org/10.1016/j.insmatheco.2020.11.012Get rights and content

Abstract

This paper develops a hybrid deep learning approach to find optimal reinsurance, investment, and dividend strategies for an insurance company in a complex stochastic system. A jump–diffusion regime-switching model with infinite horizon subject to ruin is formulated for the surplus process. A Markov chain approximation and stochastic approximation-based iterative deep learning algorithm is developed to study this type of infinite-horizon optimal control problems. Approximations of the optimal controls are obtained by using deep neural networks. The framework of Markov chain approximation plays a key role in building iterative algorithms and finding initial values. Stochastic approximation is used to search for the optimal parameters of neural networks in a bounded region determined by the Markov chain approximation method. The convergence of the algorithm is proved and the rate of convergence is provided.

Introduction

For insurance companies, due to the nature of insurance products, insurers tend to accumulate relatively large amounts of cash or cash equivalents and invest the surplus in a financial market in order to pay future claims and avoid financial ruin. Meanwhile, redundant surplus will be paid out to policyholders before deficit occurs. Hence, to optimize the cash flow management, the decision makers of insurance companies will manage the risk sharing, investment performance and dividend payment schemes. Thus, how to build the strategies of reinsurance, investment, and dividend payout is crucial to insurance industry.

Reinsurance is a standard risk sharing tool to reduce and eliminate risks borne by primary insurance carriers. The primary insurance carrier pays to reinsurance company a certain part of premiums in return for protections against the adverse claim volatilities. Since the pioneering work of Borch (1960) and Arrow (1963), there has been extensive research on optimal reinsurance. The recent book on reinsurance, Albrecher et al. (2017), provides an impressive list of references on the subject.

The optimal portfolio selection problem is of practical importance. Earlier work in this area can be traced back to Markowitz’s mean–variance model, see Markowitz (1952). The asset allocation problem for an insurance portfolio is different from that in finance, since an insurer needs to pay claims. Browne (1995) considers a model in which aggregate claims are modelled by a Brownian motion with drift, and the risky asset is modelled by a geometric Brownian motion. Hipp and Plum (2000) use the Cramér–Lundberg model to formulate the risk process of an insurance company and assume that the surplus of the insurance company can be invested in a risky asset (market index) that follows a geometric Brownian motion.

Dividend payment scheme represents an important signal about a company’s financial status and future growth opportunities. Miller and Modigliani (1961) demonstrate the relationship between a company’s dividend policy and the valuation of its shares. Instead of considering the safety aspect, optimal dividend strategies for insurance companies are first studied by De Finetti (1957), who proposed a random walk to model the surplus process and obtained that the optimal dividend payment strategy was of barrier type. This research focuses on the economic performance instead of the safety aspect to maximize the discounted total dividend payment until ruin. Gerber (1972) provides solutions for optimal dividend problem under both discrete and continuous models. Højgaard and Taksar (1999) study the reinsurance and dividend strategies in a diffusion model and provide closed-form solutions for optimal strategies.

In past decades, extensive research has been devoted to finding optimal insurance strategies using analytic techniques under various discrete-time and continuous-time models. Types of controls such as regular, singular, or impulse controls are investigated under various models such as random walk, compound Poisson process, jump diffusion model, regime-switching model, etc. Due to increasing complexity of stochastic systems such as considering multiple types of controls simultaneously, adopting nonlinear insurance/reinsurance premium principles, and multiple decision makers in a game-theoretical framework etc., closed-form solutions are not available in many cases. Recently, there is emerging research on numerically solving insurance problems using finite difference or similar type of methods; see Jin et al., 2012, Jin et al., 2013a, Jin et al., 2013b, and Van Staden et al. (2018).

On the other hand, the fast developments of machine learning, big data analytics, and artificial intelligence are changing our community and insurance market in almost all aspects. There are emerging efforts to figure out the impacts of data science on insurance industry, and to see how we can apply the novel data science approach to insurance industry such as reducing losses, claim reserve estimation, policy design, and key parameter estimation; see Wüthrich, 2018a, Wüthrich, 2018b, Hainaut, 2018, and Aleandri (2018). A comprehensive summary of machine learning techniques in non-life insurance pricing and data science such as regression trees, neural networks, and unsupervised learning is presented in Wüthrich and Buser (2017).

When managing a portfolio with multiple insurance products, the decision maker generally faces a stochastic control problem. Depending on the structures of insurance products, the control problem is categorized into two types: finite-time horizon and infinite-time horizon. There exists some literature on applying deep learning methods to solve finite-time horizon problems. Han and E (2016) and E et al. (2017) utilize neural networks to approximate the controls. The expectation of the objective function at terminal time is approximated by the average value of Monte-Carlo paths. Hence, finding optimal controls becomes searching the optimal parameters of approximating neural networks under a certain criteria guided by the rewarding function. Bachouch et al. (2018) and Huré et al. (2018) integrate deep learning methods into Monte Carlo backward optimization algorithms. Parametric neural networks are adopted and the optimization is executed backwards at discrete times. The approximating error analysis is provided. In summary, determining optimal controls in such finite-time horizon problems can be viewed as Monte Carlo projections starting from an initial value.

For infinite-time horizon problems, since there is no fixed terminal time, we can hardly use the maximization of a simple expectation of projections to design the reward function. There exists very few literature on applying deep learning methods to find stochastic optimal controls in infinite-time horizon. Cheng et al. (2020) develop a Markov chain approximation-based deep learning algorithm to approximate the optimal insurance strategies using neural networks. The idea of using Markov chain approximation method to find initial guesses is proposed. The reinsurance strategy and dividend strategy, considered as regular and singular controls respectively, are approximated by two neural networks separately. The classical gradient descent algorithm is adopted to find the weights of the two neural networks. A couple of numerical examples are presented to show that the neural-network approximating strategies converge to the analytical solutions obtained in Højgaard and Taksar (1999). In this paper, we further modify the algorithm and replace gradient descent method by stochastic approximation to calibrate the parameters of neural networks. The stochastic approximation theory provides a well-established framework to guarantee the convergence of the iterations in the weak sense. A rigorous convergence proof of the algorithm is provided in this work, while Cheng et al. (2020) are the first work to develop a hybrid Markov chain approximation-based deep learning algorithm and presents several numerical examples.

The hybrid feature of the proposed algorithm lies in an integration of neural network, Markov chain approximation, and stochastic approximation to solve a stochastic optimization problem. Markov chain approximation method (MCAM) and stochastic approximation (SA) are the main building blocks in the approximation procedures. A comprehensive introduction of the development of Markov chain approximation methods and stochastic approximation methods, together with the literature can be found in Kushner and Dupuis (2001) and Kushner and Yin (2003), respectively.

In this work, we apply our method to a complex jump–diffusion system with regime-switching. The controls are approximated by neural networks. To obtain the optimal parameters of the neural networks, we have developed two major steps. (1) Applying the Markov chain approximation method with coarse scale to estimate the initial guess of the neural network; (2) Applying stochastic approximation with fine scale to estimate the accurate parameters in a bounded region. The convergence of the numerical scheme is proved.

Comparing with the existing numerical methods on stochastic control problems, our proposed deep-learning algorithm has two main advantages. First, the introduction of machine learning framework enables us to improve the computation efficiency by using the two-scale numerical method. As it is well known, it is inevitable that one faces the problem of “curse of dimensionality” that the number of computation nodes grows exponentially when dealing with optimization problems with multiple control variables and states. We replace the optimization over the piecewise control grid for every state value by finding optimal parameters of neural networks for all state values. Now the computational complexity mainly comes from the evaluation of gradients for every state value. By using the stochastic approximation to calculate the optimum, the number of computation nodes increases linearly with respect to the number of points in the state lattice. In addition, the coarse-scale Markov chain approximation provides an initial value with small neighbourhood to conduct the stochastic approximation with fine-scale computation. Hence the computation efficiency for optimal controls can be largely improved. Second, the accuracy of numerical results can be improved by the developed algorithm. Traditional approximation methods generally use piecewise constant controls to approximate the optimal control. Then the accuracy of control strategy is subject to the denseness of the grid. The denseness of grids depends on the types and ranges of controls and states. When the ranges of controls and states are not comparable, the computation efficiency and accuracy are largely affected since it is difficult to find suitable stepsize for the lattice. On the contrary, neural networks allow the control strategy to take values in a continuous range and easily conquer the difficulty of effectively choosing a precision in control spaces with significant different scales.

The rest of the paper is organized as follows. A general formulation of surplus, dividend, investment, reinsurance strategies, and related assumptions are presented in Section 2 together with a complex regime-switching jump diffusion model. Section 3 shows the construction of an approximating Markov chain. In Section 4, the main steps of deep learning algorithms are established. The neural networks are constructed accordingly. Convergence of the algorithm is provided in Section 5. Some concluding remarks are provided in Section 6.

Section snippets

Formulation

Let us work with a complete filtered probability space (Ω,F,{Ft},P), where {Ft} (or simply Ft) is a filtration satisfying the usual condition. That is, Ft is a family of σ-algebras such that FsFt for st and that F0 contains all null sets.

An insurance company adopts reinsurance, investment and dividend strategies to manage the insurance portfolios. The surplus process depends on regimes of the market, which is modelled by a continuous-time finite-state Markov chain. The Markov chain, α(t)

Approximating Markov chain

We will construct an approximating Markov chain for the regime-switching jump diffusion model. The discrete-time controlled Markov chain is so defined that it is locally consistent with (2.10). First, we will approximate the terms of discrete claims.

There is an equivalent way to define the process (2.10) by working with the claim times and values. To do this, set ν0=0, and let νn, n1, denote the time of the nth claim, and ρn be the corresponding claim severity. Let {νn+1νn,ρn,n<} be mutually

Numerical algorithm

In this section, we give details of the numerical algorithm. In Section 4.1, we present the idea of approximating controls with neural networks and introduce the Markov chain approximation method to find the initial values with coarse scale. In Section 4.2, details of stochastic approximation method are provided to find accurate approximations with fine scale. A comprehensive description of the method is shown in Section 4.3.

Convergence

In this section, we prove the convergence of the algorithm. That is, by starting with an initial guess θk,0h, the iteration will lead to the optimal set of parameters θ. Particularly, we will prove that (4.2), (4.4) hold.

Concluding remarks

This paper develops a hybrid Markov chain approximation and stochastic approximation-based deep learning method to find the optimal investment, reinsurance, and dividend strategies in a complex stochastic system. An infinite-horizon subject to random ruin time optimization problem is formulated. The value function and controls are approximated by deep neural networks. The Markov chain approximation method locates the initial guesses with coarse scale. A stochastic approximation algorithm is

Acknowledgements

We are grateful to the editors and anonymous referees for their insightful comments and suggestions. These comments/suggestions greatly improved the quality and readability of the paper. Z. Jin and H. Yang thank the support of the Research Grants Council of the Hong Kong Special Administrative Region (project no. 17330816), Z. Jin’s research was also supported by a Faculty Research Grant from The University of Melbourne, Australia. G. Yin’s research was supported in part by the National Science

References (31)

  • BorchK.

    Reciprocal reinsurance treaties

    Astin Bull.

    (1960)
  • BrowneS.

    Optimal investment policies for a firm with a random risk process: Exponential utility and minimizing the probability of ruin

    Math. Oper. Res.

    (1995)
  • ChengX. et al.

    Optimal insurance strategies: A hybrid deep learning Markov chain approximation approach

    ASTIN Bull. J. IAA

    (2020)
  • De FinettiB.

    Su un’impostazione alternativa della teoria collettiva del rischio

  • EW. et al.

    Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

    Commun. Math. Statist.

    (2017)
  • Cited by (12)

    • Machine learning in long-term mortality forecasting

      2024, Geneva Papers on Risk and Insurance: Issues and Practice
    View all citing articles on Scopus
    View full text