Fast derivatives of likelihood functionals for ODE based models using adjoint-state method

Melicher, Valdemar; Haber, Tom; Vanroose, Wim

doi:10.1007/s00180-017-0765-8

Fast derivatives of likelihood functionals for ODE based models using adjoint-state method

Original Paper
Published: 16 September 2017

Volume 32, pages 1621–1643, (2017)
Cite this article

Computational Statistics Aims and scope Submit manuscript

472 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

We consider time series data modeled by ordinary differential equations (ODEs), widespread models in physics, chemistry, biology and science in general. The sensitivity analysis of such dynamical systems usually requires calculation of various derivatives with respect to the model parameters. We employ the adjoint state method (ASM) for efficient computation of the first and the second derivatives of likelihood functionals constrained by ODEs with respect to the parameters of the underlying ODE model. Essentially, the gradient can be computed with a cost (measured by model evaluations) that is independent of the number of the ODE model parameters and the Hessian with a linear cost in the number of the parameters instead of the quadratic one. The sensitivity analysis becomes feasible even if the parametric space is high-dimensional. The main contributions are derivation and rigorous analysis of the ASM in the statistical context, when the discrete data are coupled with the continuous ODE model. Further, we present a highly optimized implementation of the results and its benchmarks on a number of problems. The results are directly applicable in (e.g.) maximum-likelihood estimation or Bayesian sampling of ODE based statistical models, allowing for faster, more stable estimation of parameters of the underlying ODE model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Numerical methods for computing sensitivities for ODEs and DDEs

Article 21 September 2016

Extracting parametric dynamics from time-series data

Article 01 July 2023

A Bayesian approach to estimate parameters of ordinary differential equation

Article 10 February 2020

Notes

Considering spatial phenomena such as diffusion and(or) convection leads to PDE models, see for example (Slodička and Balážová 2010).
Other continuous “interpolation” are possible such as piecewise-linear or by cubic splines, but they are less graphic.
The variance $\sigma ^2$ has here a purely ad hoc use for the above argument. We are not interested if it is prescribed or estimated from the data.
The diagonal system (24) is the most sparse system one can think of.
An alternative argumentation could employ equivalence (9).

References

Bazaraa MS, Sherali HD, Shetty CM (2006) Nonlinear programming: theory and algorithms, 3rd edn. John Wiley & Sons, Hoboken
Book MATH Google Scholar
Bensoussan A, Lions J, Papanicolaou G (1978) Asymptotic analysis for periodic structures. North-Holland Pub. Co, Amsterdam
MATH Google Scholar
Bertsekas DP (1999) Nonlinear programming. Athena scientific, Belmont
MATH Google Scholar
Brooks S, Gelman A, Jones G, Meng XL (2011) Handbook of markov chain monte carlo. CRC Press, Boca Raton
Book MATH Google Scholar
Cesari L (1983) Optimization-theory and applications: problems with ordinary differential equations, applications of mathematics, vol 17. Springer-Verlag, New York
Book MATH Google Scholar
Cimrák I, Melicher V (2007) Sensitivity analysis framework for micromagnetism with application to the optimal shape design of magnetic random access memories. Inverse Probl 23(2):563–588
Article MATH MathSciNet Google Scholar
Coddington EA, Levinson N (1955) Theory of ordinary differential equations. Tata McGraw-Hill Education, New York
MATH Google Scholar
Delyon B, Lavielle M, Moulines E (1999) Convergence of a stochastic approximation version of the EM algorithm. Ann Stat 27(1):94–128
Article MATH MathSciNet Google Scholar
Draelants D, Broeckhove J, Beemster GTS, Vanroose W (2012) Numerical bifurcation analysis of the pattern formation in a cell based auxin transport model. J Math Biol 67(5):1279–1305
Article MATH MathSciNet Google Scholar
Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361
Article Google Scholar
Haber T, Melicher V, Michiels N, Kovac T, Nemeth B, Claes J (2016) DiffMEM. https://bitbucket.org/tomhaber/diffmem/branch/analysis
Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, Woodward CS (2005) SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM Trans Math Softw 31(3):363–396
Article MATH MathSciNet Google Scholar
Knoll D, Keyes D (2004) Jacobian-free Newton–Krylov methods: a survey of approaches and applications. J Comput Phys 193(2):357–397
Article MATH MathSciNet Google Scholar
Lavielle M, Samson A, Karina Fermin A, Mentré F (2011) Maximum likelihood estimation of long-term hiv dynamic models and antiviral response. Biometrics 67(1):250–259
Article MATH MathSciNet Google Scholar
Lewis JM, Lakshmivarahan S, Dhall S (2006) Dynamic data assimilation: a least squares approach, encyclopedia of mathematics and its applications, vol 13. Cambridge University Press, Cambridge
Book MATH Google Scholar
Lindstrom MJ, Bates DM (1990) Nonlinear mixed effects models for repeated measures data. Biometrics 46(3):673–687
Article MathSciNet Google Scholar
Lions JL (1971) Optimal control of systems governed by partial differential equations. Springer, Berlin
Book MATH Google Scholar
Martin J, Wilcox L, Burstedde C, Ghattas O (2012) A stochastic Newton MCMC method for large-scale statistical inverse problems with application to seismic inversion. SIAM J Sci Comput 34(3):A1460–A1487
Article MATH MathSciNet Google Scholar
Melicher V, Vrábel’ V (2013) On a continuation approach in Tikhonov regularization and its application in piecewise-constant parameter identification. Inverse Probl 29(11):115,008
Moré JJ (1978) The Levenberg–Marquardt algorithm: implementation and theory. In: Watson G (ed) Numerical analysis, lecture notes in mathematics, vol 630, Springer, Berlin, pp 105–116
Murray JD (2002) Mathematical biology I: an introduction, interdisciplinary applied mathematics, vol 17, 3rd edn. Springer-Verlag, New York
Google Scholar
Raue A, Schilling M, Bachmann J, Matteson A, Schelke M, Kaschek D, Hug S, Kreutz C, Harms BD, Theis FJ, Klingmüller U, Timmer J (2013) Lessons learned from quantitative dynamical modeling in systems biology. Plos One 8(9):1–17
Article Google Scholar
Serban R, Hindmarsh AC (2005) CVODES: the sensitivity-enabled ODE solver in SUNDIALS. In: ASME 2005 international design engineering technical conferences and computers and information in engineering conference, American Society of Mechanical Engineers, pp 257–269
Slodička M, Balážová A (2010) Decomposition method for solving multi-species reactive transport problems coupled with first-order kinetics applicable to a chain with identical reaction rates. Journal of Computational and Applied Mathematics 234(4):1069–1077, proceedings of the Thirteenth International Congress on Computational and Applied Mathematics (ICCAM-2008), Ghent, Belgium, 7–11 July, 2008
Tornøe CW, Agersø H, Jonsson E, Madsen H, Nielsen HA (2004) Non-linear mixed-effects pharmacokinetic/pharmacodynamic modelling in NLME using differential equations. Comput Methods Programs Biomed 76(1):3–40
Article Google Scholar
Wong R (2001) Asymptotic approximations of integrals, Classics in applied mathematics, vol 34. SIAM, Boston
Book Google Scholar
Zeidler E (1985) Nonlinear functional analysis and its applications: fixed point theorems, nonlinear functional analysis and its applications. Springer-Verlag, New York
Book MATH Google Scholar

Download references

Acknowledgements

We would like to thank Xavier Woot de Trixhe from Janssen Pharmaceutica for numerous very interesting discussions on PK/PD, virology, biological pathways modeling, NLMEMs and on life in general. They were an important source of motivation and provided a view from a different perspective. And we would like to thank the anonymous reviewers for their substantial input, enhancing the quality of the paper.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, 2020, Antwerp, Belgium
Valdemar Melicher & Wim Vanroose
Expertise Centre for Digital Media, Hasselt University, Wetenschapspark 2, 3590, Diepenbeek, Belgium
Tom Haber

Authors

Valdemar Melicher
View author publications
You can also search for this author in PubMed Google Scholar
Tom Haber
View author publications
You can also search for this author in PubMed Google Scholar
Wim Vanroose
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valdemar Melicher.

Additional information

The work of the first two authors was supported by IWT O&O Project 130406—ExaScience Life HPC.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 108 KB)

Supplementary material 2 (R 4 KB)

Appendix: 1 Proofs

1.1 Proof of Theorem 1

First, when compared with Theorem 4.D from Zeidler (1985) we work with $X={\mathbb {R}}^m$ and $P={\mathbb {R}}^p.$ Those are complete normed vector spaces i.e. they are Banach spaces. Second, the initial condition is dependent only on parameter $\varvec{\phi }$, not on any free parameter y as in Theorem 4.D.

Set $J = [-1, 1].$ Let us do the following rescaling: $t = sa,$ ${\varvec{z}}(s) := {{\varvec{u}}}(as) - {{\varvec{u}}}_0(\varvec{\phi })$ for all $s \in J.$ Then (1) is equivalent to

$$\begin{aligned} {\varvec{z}}'(s) - a{\varvec{f}}(as, {\varvec{z}}(s) + {{\varvec{u}}}_0(\varvec{\phi }), \varvec{\phi }) = \varvec{0}\quad \text{ for } \text{ all } s \in J, {\varvec{z}}(0) = \varvec{0}. \end{aligned}$$

This can be written as an operator equation $F({\varvec{z}}, a, \varvec{\phi }) = 0$ with the operator $F:{\varvec{Z}}\times {\varvec{A}}\rightarrow {\varvec{W}}$ and spaces ${\varvec{Z}}= \{{\varvec{z}}\in {\varvec{C}}^1(J,{\mathbb {R}}^m): {\varvec{z}}(0) = \varvec{0}\}$, ${\varvec{W}}= C(J,{\mathbb {R}}^m).$ The space ${\varvec{A}}$ contains all the parameters $(a, \varvec{\phi })$, i.e. ${\varvec{A}}= {\mathbb {R}}\times {\mathbb {R}}^p.$

Set ${\varvec{q}}= (\varvec{0}, 0, \varvec{\phi }).$ Both F and $F_z$ are continuous at ${\varvec{q}}.$ Obviously, $F({\varvec{q}})=\varvec{0}$ and $F_{{\varvec{z}}}({\varvec{q}}){\varvec{z}}= {\varvec{z}}'.$ The crucial observation is that for every ${\varvec{w}}\in {\varvec{W}},$ there exists exactly one ${\varvec{z}}\in {\varvec{Z}}$ with ${\varvec{z}}'={\varvec{w}}$, namely ${\varvec{z}}(s)=\int _0^s {\varvec{w}}(t) \,\mathrm {d}t$. Hence $F_{{\varvec{z}}}({\varvec{q}}):{\varvec{Z}}\rightarrow {\varvec{W}}$ is bijective. The implicit function theorem yields the conclusions [see e.g. Theorem 4.B in Zeidler (1985)]. $\square $

1.2 Proof of Lemma 1

After realizing that $l$ depends on $\varvec{\phi }$ only through ${{\varvec{u}}}$, (12) is formally a direct application of the chain rule ($d_{{{\varvec{u}}}}$ denotes the derivative of the metric with respect to the model state ${{\varvec{u}}}$).

The r.h.s. of (12) is a well-posed finite expression. First, we have assumed that the metric is sufficiently smooth, thus $d_{{{\varvec{u}}}}$ is continuous. Second, ${\varvec{s}}$ is continuous as well owing to Theorem 1. A distribution can be rescaled by any at least continuous function, as here $\{\delta \}$ by $d_{{{\varvec{u}}}}$.

Thus, the first differential on the l.h.s of (12) exists as well. It is moreover continuous, i.e. it is Fréchet.^{Footnote 5} $\square $

1.3 Proof of Theorem 2

First, let us assume that we have already constructed a unique solution ${\varvec{v}}$ to (15) up to a certain measurement point $t_i$. The adjoint problem is solved backwards in time. Consequently, we will construct its prolongation on $[t_i, t_{i-1})$.

Let ${\varvec{v}}_i^+$ be the ODE solution just before integrating the measurement at time $t_i$, i.e. at time $t_i^+.$ We simply stop the integration at $t_i^+,$ add $d_{{{\varvec{u}}}}({\varvec{y}}_i,{\varvec{g}}(t_i, \varvec{\phi }))$ to ${\varvec{v}}_i^+$ and solve

$$\begin{aligned} \begin{aligned} d_{t}{\varvec{v}}&= -J^t_{{{\varvec{u}}}}({\varvec{f}}){\varvec{v}}, \quad t\in (t_i,t_{i-1}), \\ {\varvec{v}}(t_{i})&= {\varvec{v}}_i^+ + d_{{{\varvec{u}}}}({\varvec{y}}_i,{\varvec{g}}(t_i, \varvec{\phi })). \end{aligned} \end{aligned}$$

(32)

This is a simple linear ODE with a continuous coefficient $J^t_{{{\varvec{u}}}}({\varvec{f}}),$ since $f\in {\varvec{C}}^1$. The classical results yield the global solution on $(t_i,t_{i-1})$ [see e.g. Theorem 5.1 and Theorem 5.2 from Coddington and Levinson (1955)]. This concludes the proof of the existence and uniqueness.

Now, we prove (14). Let us without a loss of generality assume that there are no measurements in times 0 and T. It is a well-known result of theory of distributions (in the sense of functional analysis), that the classical integration by part formula

$$\begin{aligned} \int _0^T d_{t}{\varvec{v}} {\varvec{w}}\ dt = [{\varvec{v}}{\varvec{w}}]_{0}^{T} -\int _0^T {\varvec{v}}d_{t}{\varvec{w}} \ dt \end{aligned}$$

(33)

is valid for ${\varvec{w}}\in {\varvec{C}}^1$ even if the derivative $d_{t}{\varvec{v}}$ exists on [0, T] only in a weak sense, i.e. almost everywhere. Actually, (33) is the definition of the weak derivative of ${\varvec{v}}$ taking only ${\varvec{w}}\in {\varvec{C}}^1_0([0,T]).$ Consequently, since ${\varvec{s}}\in {\varvec{C}}^1([0,T]),$ we can safely proceed as follows

$$\begin{aligned} \begin{aligned} \left<{\varvec{s}},\{\delta \}d_{{{\varvec{u}}}}({\varvec{y}},{\varvec{g}}(t, \varvec{\phi }))\right>&\mathop {=}\limits ^{(15)} \left<{\varvec{s}},d_{t}{\varvec{v}} + J^t_{{{\varvec{u}}}}({\varvec{f}}){\varvec{v}}\right>\\&\mathop {=}\limits ^{(8)} -{\varvec{v}}^t(0)J_{\varvec{\phi }}({{\varvec{u}}}_0){\varvec{h}}-\left( d_{t}{\varvec{s}} - J_{{{\varvec{u}}}}({\varvec{f}}){\varvec{s}},{\varvec{v}}\right) \\&\mathop {=}\limits ^{(8)} -{\varvec{v}}^t(0)J_{\varvec{\phi }}({{\varvec{u}}}_0){\varvec{h}}-\left( J_{\varvec{\phi }}({\varvec{f}}){{\varvec{h}}},{\varvec{v}}\right) . \end{aligned} \end{aligned}$$

(34)

$\square $

1.4 Proof of Lemma 2

The existence and uniqueness of ${\varvec{\varsigma }}$ is a direct results of Theorem 1. Now, (20) is derived as follows:

$$\begin{aligned} \begin{aligned} \left<{\varvec{\varsigma }},\{\delta \}d_{{{\varvec{u}}}}({\varvec{y}},{\varvec{g}}(t, \varvec{\phi }))\right>&\mathop {=}\limits ^{(15)} \left<{\varvec{\varsigma }},d_{t}{\varvec{v}} + J^t_{{{\varvec{u}}}}({\varvec{f}}){\varvec{v}}\right>\\&\mathop {=}\limits ^{(15), (18)} -{({{\varvec{u}}}_0)}_{\varvec{\phi }\varvec{\phi }}{\varvec{h}}_1{\varvec{h}}_2\cdot {\varvec{v}}(0)\\&\quad \quad -\left( d_{t}{\varvec{\varsigma }},{\varvec{v}}\right) +\left( J_{{{\varvec{u}}}}({\varvec{f}}){\varvec{\varsigma }}, {\varvec{v}}\right) . \end{aligned} \end{aligned}$$

(35)

This after substituting for $J_{{{\varvec{u}}}}({\varvec{f}}){\varvec{\varsigma }}$ from (18) directly yields (20). Analogically to the proof of Theorem 2, we needed ${\varvec{\varsigma }}\in {\varvec{C}}^1([0,T])$ to be able to integrate by parts.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Melicher, V., Haber, T. & Vanroose, W. Fast derivatives of likelihood functionals for ODE based models using adjoint-state method. Comput Stat 32, 1621–1643 (2017). https://doi.org/10.1007/s00180-017-0765-8

Download citation

Received: 12 December 2016
Accepted: 10 September 2017
Published: 16 September 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s00180-017-0765-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast derivatives of likelihood functionals for ODE based models using adjoint-state method

Abstract

Access this article

Similar content being viewed by others

Numerical methods for computing sensitivities for ODEs and DDEs

Extracting parametric dynamics from time-series data

A Bayesian approach to estimate parameters of ordinary differential equation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 108 KB)

Supplementary material 2 (R 4 KB)

Appendix: 1 Proofs

1.1 Proof of Theorem 1

1.2 Proof of Lemma 1

1.3 Proof of Theorem 2

1.4 Proof of Lemma 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast derivatives of likelihood functionals for ODE based models using adjoint-state method

Abstract

Access this article

Similar content being viewed by others

Numerical methods for computing sensitivities for ODEs and DDEs

Extracting parametric dynamics from time-series data

A Bayesian approach to estimate parameters of ordinary differential equation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 108 KB)

Supplementary material 2 (R 4 KB)

Appendix: 1 Proofs

Appendix: 1 Proofs

1.1 Proof of Theorem 1

1.2 Proof of Lemma 1

1.3 Proof of Theorem 2

1.4 Proof of Lemma 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation