Elsevier

Journal of Computational Physics

Volume 375, 15 December 2018, Pages 1339-1364
Journal of Computational Physics

DGM: A deep learning algorithm for solving partial differential equations,☆☆

https://doi.org/10.1016/j.jcp.2018.08.029Get rights and content

Highlights

  • We develop a deep learning algorithm for solving high-dimensional PDEs.

  • The algorithm is meshfree, which is key since meshes become infeasible in higher dimensions.

  • We accurately solve a class of high-dimensional free boundary PDEs in up to 200 dimensions.

  • We prove a theorem regarding the approximation power of neural networks for quasilinear PDEs.

Abstract

High-dimensional PDEs have been a longstanding computational challenge. We propose to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. Our algorithm is meshfree, which is key since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural network is trained on batches of randomly sampled time and space points. The algorithm is tested on a class of high-dimensional free boundary PDEs, which we are able to accurately solve in up to 200 dimensions. The algorithm is also tested on a high-dimensional Hamilton–Jacobi–Bellman PDE and Burgers' equation. The deep learning algorithm approximates the general solution to the Burgers' equation for a continuum of different boundary conditions and physical conditions (which can be viewed as a high-dimensional space). We call the algorithm a “Deep Galerkin Method (DGM)” since it is similar in spirit to Galerkin methods, with the solution approximated by a neural network instead of a linear combination of basis functions. In addition, we prove a theorem regarding the approximation power of neural networks for a class of quasilinear parabolic PDEs.

Section snippets

Deep learning and high-dimensional PDEs

High-dimensional partial differential equations (PDEs) are used in physics, engineering, and finance. Their numerical solution has been a longstanding challenge. Finite difference methods become infeasible in higher dimensions due to the explosion in the number of grid points and the demand for reduced time step size. If there are d space dimensions and 1 time dimension, the mesh is of size Od+1. This quickly becomes computationally intractable when the dimension d becomes even moderately

Algorithm

Consider a parabolic PDE with d spatial dimensions:.ut(t,x)+Lu(t,x)=0,....(t,x)[0,T]×Ω,.u(t=0,x)=u0(x),.u(t,x)=g(t,x),....xΩ, where xΩRd. The DGM algorithm approximates u(t,x) with a deep neural network f(t,x;θ) where θRK are the neural network's parameters. Note that the differential operators ft(t,x;θ) and Lf(t,x;θ) can be calculated analytically. Construct the objective function:J(f)=ft(t,x;θ)+Lf(t,x;θ)[0,T]×Ω,ν12+f(t,x;θ)g(t,x)[0,T]×Ω,ν22+f(0,x;θ)u0(x)Ω,ν32. Here, f(y)

A Monte Carlo method for fast computation of second derivatives

This section describes a modified algorithm which may be more computationally efficient in some cases. The term Lf(t,x;θ) contains second derivatives 2fxixj(t,x;θ) which may be expensive to compute in higher dimensions. For instance, 20,000 second derivatives must be calculated in d=200 dimensions.

The complicated architectures of neural networks can make it computationally costly to calculate the second derivatives (for example, see the neural network architecture (4.2)). The computational

Numerical analysis for a high-dimensional free boundary PDE

We test our algorithm on a class of high-dimensional free boundary PDEs. These free boundary PDEs are used in finance to price American options and are often referred to as “American option PDEs”. An American option is a financial derivative on a portfolio of stocks. The option owner may at any time t[0,T] choose to exercise the American option and receive a payoff which is determined by the underlying prices of the stocks in the portfolio. T is called the maturity date of the option and the

High-dimensional Hamilton–Jacobi–Bellman PDE

We also test the deep learning algorithm on a high-dimensional Hamilton–Jacobi–Bellman (HJB) equation corresponding to the optimal control of a stochastic heat equation. Specifically, we demonstrate that the deep learning algorithm accurately solves the high-dimensional PDE (5.5). The PDE (5.5) is motivated by the problem of optimally controlling the stochastic partial differential equation (SPDE):vt(t,x)=α2vx2(t,x)+u(x)+σ2Wtx(t,x),....x[0,L],v(t,x=0)=v¯(0),v(t,x=L)=v¯(L),v(t=0,x)=v0(x),

Burgers' equation

It is often of interest to find the solution of a PDE over a range of problem setups (e.g., different physical conditions and boundary conditions). For example, this may be useful for the design of engineering systems or uncertainty quantification. The problem setup space may be high-dimensional and therefore may require solving many PDEs for many different problem setups, which can be computationally expensive.

Let the variable p represent the problem setup (i.e., physical conditions, boundary

Neural network approximation theorem for PDEs

Let the L2 error J(f) measure how well the neural network f satisfies the differential operator, boundary condition, and initial condition. Define Cn as the class of neural networks with n hidden units and let fn be a neural network with n hidden units which minimizes J(f). We prove thatthere exists fnCn such that J(fn)0, as n, andfnu....as....n, in the appropriate sense, for a class of quasilinear parabolic PDEs with the principle term in divergence form under certain growth and

Conclusion

We believe that deep learning could become a valuable approach for solving high-dimensional PDEs, which are important in physics, engineering, and finance. The PDE solution can be approximated with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. We prove that the neural network converges to the solution of the partial differential equation as the number of hidden units increases.

Our deep learning algorithm for solving PDEs

References (49)

  • L. Boccardo et al.

    Summability and existence results for nonlinear parabolic equations

    Nonlinear Anal., Theory Methods Appl.

    (2009)
  • H. Bungartz et al.

    Sparse grids

    Acta Numer.

    (2004)
  • P. Chandra et al.

    Feedforward sigmoidal networks – equicontinuity and fault-tolerance properties

    IEEE Trans. Neural Netw.

    (2004)
  • P. Chaudhari et al.

    Deep Relaxation: Partial Differential Equations for Optimizing Deep Neural Networks

    (2017)
  • S. Cerrai

    Stationary Hamilton–Jacobi equations in Hilbert spaces and applications to a stochastic optimal control problem

    SIAM J. Control Optim.

    (2001)
  • G. Cybenko

    Approximation by superposition of a sigmoidal function

    Math. Control Signals Syst.

    (1989)
  • A. Davie et al.

    Convergence of numerical schemes for the solution of parabolic stochastic partial differential equations

    Math. Comput.

    (2000)
  • J. Dean et al.

    Large scale distributed deep networks

  • A. Debussche et al.

    Optimal control of a stochastic heat equation with boundary-noise and boundary-control

    ESAIM Control Optim. Calc. Var.

    (2007)
  • W. E et al.

    Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations

    (2017)
  • M. Fujii et al.

    Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs

  • A.M. Garcia et al.

    A real variable lemma and the continuity of paths of some Gaussian processes

    Indiana Univ. Math. J.

    (December 1970)
  • X. Glorot et al.

    Understanding the difficulty of training deep feedforward neural networks

  • J. Gaines

    Numerical Experiments with SPDEs

    (1995)
  • Cited by (1906)

    • Solving PDEs on unknown manifolds with machine learning

      2024, Applied and Computational Harmonic Analysis
    View all citing articles on Scopus

    The authors thank seminar participants at the JP Morgan Machine Learning and AI Forum seminar, the Imperial College London Applied Mathematics and Mathematical Physics seminar, the Department of Applied Mathematics at the University of Colorado Boulder, Princeton University, and Northwestern University for their comments. The authors would also like to thank participants at the 2017 INFORMS Applied Probability Conference, the 2017 Greek Stochastics Conference, and the 2018 SIAM Annual Meeting for their comments.

    ☆☆

    Research of K.S. supported in part by the National Science Foundation (DMS 1550918). Computations for this paper were performed using the Blue Waters supercomputer grant “Distributed Learning with Neural Networks”.

    View full text