Skip to main content
Log in

Fast derivatives of likelihood functionals for ODE based models using adjoint-state method

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We consider time series data modeled by ordinary differential equations (ODEs), widespread models in physics, chemistry, biology and science in general. The sensitivity analysis of such dynamical systems usually requires calculation of various derivatives with respect to the model parameters. We employ the adjoint state method (ASM) for efficient computation of the first and the second derivatives of likelihood functionals constrained by ODEs with respect to the parameters of the underlying ODE model. Essentially, the gradient can be computed with a cost (measured by model evaluations) that is independent of the number of the ODE model parameters and the Hessian with a linear cost in the number of the parameters instead of the quadratic one. The sensitivity analysis becomes feasible even if the parametric space is high-dimensional. The main contributions are derivation and rigorous analysis of the ASM in the statistical context, when the discrete data are coupled with the continuous ODE model. Further, we present a highly optimized implementation of the results and its benchmarks on a number of problems. The results are directly applicable in (e.g.) maximum-likelihood estimation or Bayesian sampling of ODE based statistical models, allowing for faster, more stable estimation of parameters of the underlying ODE model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Considering spatial phenomena such as diffusion and(or) convection leads to PDE models, see for example (Slodička and Balážová 2010).

  2. Other continuous “interpolation” are possible such as piecewise-linear or by cubic splines, but they are less graphic.

  3. The variance \(\sigma ^2\) has here a purely ad hoc use for the above argument. We are not interested if it is prescribed or estimated from the data.

  4. The diagonal system (24) is the most sparse system one can think of.

  5. An alternative argumentation could employ equivalence (9).

References

  • Bazaraa MS, Sherali HD, Shetty CM (2006) Nonlinear programming: theory and algorithms, 3rd edn. John Wiley & Sons, Hoboken

    Book  MATH  Google Scholar 

  • Bensoussan A, Lions J, Papanicolaou G (1978) Asymptotic analysis for periodic structures. North-Holland Pub. Co, Amsterdam

    MATH  Google Scholar 

  • Bertsekas DP (1999) Nonlinear programming. Athena scientific, Belmont

    MATH  Google Scholar 

  • Brooks S, Gelman A, Jones G, Meng XL (2011) Handbook of markov chain monte carlo. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Cesari L (1983) Optimization-theory and applications: problems with ordinary differential equations, applications of mathematics, vol 17. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Cimrák I, Melicher V (2007) Sensitivity analysis framework for micromagnetism with application to the optimal shape design of magnetic random access memories. Inverse Probl 23(2):563–588

    Article  MATH  MathSciNet  Google Scholar 

  • Coddington EA, Levinson N (1955) Theory of ordinary differential equations. Tata McGraw-Hill Education, New York

    MATH  Google Scholar 

  • Delyon B, Lavielle M, Moulines E (1999) Convergence of a stochastic approximation version of the EM algorithm. Ann Stat 27(1):94–128

    Article  MATH  MathSciNet  Google Scholar 

  • Draelants D, Broeckhove J, Beemster GTS, Vanroose W (2012) Numerical bifurcation analysis of the pattern formation in a cell based auxin transport model. J Math Biol 67(5):1279–1305

    Article  MATH  MathSciNet  Google Scholar 

  • Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361

    Article  Google Scholar 

  • Haber T, Melicher V, Michiels N, Kovac T, Nemeth B, Claes J (2016) DiffMEM. https://bitbucket.org/tomhaber/diffmem/branch/analysis

  • Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, Woodward CS (2005) SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM Trans Math Softw 31(3):363–396

    Article  MATH  MathSciNet  Google Scholar 

  • Knoll D, Keyes D (2004) Jacobian-free Newton–Krylov methods: a survey of approaches and applications. J Comput Phys 193(2):357–397

    Article  MATH  MathSciNet  Google Scholar 

  • Lavielle M, Samson A, Karina Fermin A, Mentré F (2011) Maximum likelihood estimation of long-term hiv dynamic models and antiviral response. Biometrics 67(1):250–259

    Article  MATH  MathSciNet  Google Scholar 

  • Lewis JM, Lakshmivarahan S, Dhall S (2006) Dynamic data assimilation: a least squares approach, encyclopedia of mathematics and its applications, vol 13. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Lindstrom MJ, Bates DM (1990) Nonlinear mixed effects models for repeated measures data. Biometrics 46(3):673–687

    Article  MathSciNet  Google Scholar 

  • Lions JL (1971) Optimal control of systems governed by partial differential equations. Springer, Berlin

    Book  MATH  Google Scholar 

  • Martin J, Wilcox L, Burstedde C, Ghattas O (2012) A stochastic Newton MCMC method for large-scale statistical inverse problems with application to seismic inversion. SIAM J Sci Comput 34(3):A1460–A1487

    Article  MATH  MathSciNet  Google Scholar 

  • Melicher V, Vrábel’ V (2013) On a continuation approach in Tikhonov regularization and its application in piecewise-constant parameter identification. Inverse Probl 29(11):115,008

  • Moré JJ (1978) The Levenberg–Marquardt algorithm: implementation and theory. In: Watson G (ed) Numerical analysis, lecture notes in mathematics, vol 630, Springer, Berlin, pp 105–116

  • Murray JD (2002) Mathematical biology I: an introduction, interdisciplinary applied mathematics, vol 17, 3rd edn. Springer-Verlag, New York

    Google Scholar 

  • Raue A, Schilling M, Bachmann J, Matteson A, Schelke M, Kaschek D, Hug S, Kreutz C, Harms BD, Theis FJ, Klingmüller U, Timmer J (2013) Lessons learned from quantitative dynamical modeling in systems biology. Plos One 8(9):1–17

    Article  Google Scholar 

  • Serban R, Hindmarsh AC (2005) CVODES: the sensitivity-enabled ODE solver in SUNDIALS. In: ASME 2005 international design engineering technical conferences and computers and information in engineering conference, American Society of Mechanical Engineers, pp 257–269

  • Slodička M, Balážová A (2010) Decomposition method for solving multi-species reactive transport problems coupled with first-order kinetics applicable to a chain with identical reaction rates. Journal of Computational and Applied Mathematics 234(4):1069–1077, proceedings of the Thirteenth International Congress on Computational and Applied Mathematics (ICCAM-2008), Ghent, Belgium, 7–11 July, 2008

  • Tornøe CW, Agersø H, Jonsson E, Madsen H, Nielsen HA (2004) Non-linear mixed-effects pharmacokinetic/pharmacodynamic modelling in NLME using differential equations. Comput Methods Programs Biomed 76(1):3–40

    Article  Google Scholar 

  • Wong R (2001) Asymptotic approximations of integrals, Classics in applied mathematics, vol 34. SIAM, Boston

    Book  Google Scholar 

  • Zeidler E (1985) Nonlinear functional analysis and its applications: fixed point theorems, nonlinear functional analysis and its applications. Springer-Verlag, New York

    Book  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank Xavier Woot de Trixhe from Janssen Pharmaceutica for numerous very interesting discussions on PK/PD, virology, biological pathways modeling, NLMEMs and on life in general. They were an important source of motivation and provided a view from a different perspective. And we would like to thank the anonymous reviewers for their substantial input, enhancing the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valdemar Melicher.

Additional information

The work of the first two authors was supported by IWT O&O Project 130406—ExaScience Life HPC.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 108 KB)

Supplementary material 2 (R 4 KB)

Appendix: 1 Proofs

Appendix: 1 Proofs

1.1 Proof of Theorem 1

First, when compared with Theorem 4.D from Zeidler (1985) we work with \(X={\mathbb {R}}^m\) and \(P={\mathbb {R}}^p.\) Those are complete normed vector spaces i.e. they are Banach spaces. Second, the initial condition is dependent only on parameter \(\varvec{\phi }\), not on any free parameter y as in Theorem 4.D.

Set \(J = [-1, 1].\) Let us do the following rescaling: \(t = sa,\) \({\varvec{z}}(s) := {{\varvec{u}}}(as) - {{\varvec{u}}}_0(\varvec{\phi })\) for all \(s \in J.\) Then (1) is equivalent to

$$\begin{aligned} {\varvec{z}}'(s) - a{\varvec{f}}(as, {\varvec{z}}(s) + {{\varvec{u}}}_0(\varvec{\phi }), \varvec{\phi }) = \varvec{0}\quad \text{ for } \text{ all } s \in J, {\varvec{z}}(0) = \varvec{0}. \end{aligned}$$

This can be written as an operator equation \(F({\varvec{z}}, a, \varvec{\phi }) = 0\) with the operator \(F:{\varvec{Z}}\times {\varvec{A}}\rightarrow {\varvec{W}}\) and spaces \({\varvec{Z}}= \{{\varvec{z}}\in {\varvec{C}}^1(J,{\mathbb {R}}^m): {\varvec{z}}(0) = \varvec{0}\}\), \({\varvec{W}}= C(J,{\mathbb {R}}^m).\) The space \({\varvec{A}}\) contains all the parameters \((a, \varvec{\phi })\), i.e. \({\varvec{A}}= {\mathbb {R}}\times {\mathbb {R}}^p.\)

Set \({\varvec{q}}= (\varvec{0}, 0, \varvec{\phi }).\) Both F and \(F_z\) are continuous at \({\varvec{q}}.\) Obviously, \(F({\varvec{q}})=\varvec{0}\) and \(F_{{\varvec{z}}}({\varvec{q}}){\varvec{z}}= {\varvec{z}}'.\) The crucial observation is that for every \({\varvec{w}}\in {\varvec{W}},\) there exists exactly one \({\varvec{z}}\in {\varvec{Z}}\) with \({\varvec{z}}'={\varvec{w}}\), namely \({\varvec{z}}(s)=\int _0^s {\varvec{w}}(t) \,\mathrm {d}t\). Hence \(F_{{\varvec{z}}}({\varvec{q}}):{\varvec{Z}}\rightarrow {\varvec{W}}\) is bijective. The implicit function theorem yields the conclusions [see e.g. Theorem 4.B in Zeidler (1985)]. \(\square \)

1.2 Proof of Lemma 1

After realizing that \(l\) depends on \(\varvec{\phi }\) only through \({{\varvec{u}}}\), (12) is formally a direct application of the chain rule (\(d_{{{\varvec{u}}}}\) denotes the derivative of the metric with respect to the model state \({{\varvec{u}}}\)).

The r.h.s. of (12) is a well-posed finite expression. First, we have assumed that the metric is sufficiently smooth, thus \(d_{{{\varvec{u}}}}\) is continuous. Second, \({\varvec{s}}\) is continuous as well owing to Theorem 1. A distribution can be rescaled by any at least continuous function, as here \(\{\delta \}\) by \(d_{{{\varvec{u}}}}\).

Thus, the first differential on the l.h.s of (12) exists as well. It is moreover continuous, i.e. it is Fréchet.Footnote 5 \(\square \)

1.3 Proof of Theorem 2

First, let us assume that we have already constructed a unique solution \({\varvec{v}}\) to (15) up to a certain measurement point \(t_i\). The adjoint problem is solved backwards in time. Consequently, we will construct its prolongation on \([t_i, t_{i-1})\).

Let \({\varvec{v}}_i^+\) be the ODE solution just before integrating the measurement at time \(t_i\), i.e. at time \(t_i^+.\) We simply stop the integration at \(t_i^+,\) add \(d_{{{\varvec{u}}}}({\varvec{y}}_i,{\varvec{g}}(t_i, \varvec{\phi }))\) to \({\varvec{v}}_i^+\) and solve

$$\begin{aligned} \begin{aligned} d_{t}{\varvec{v}}&= -J^t_{{{\varvec{u}}}}({\varvec{f}}){\varvec{v}}, \quad t\in (t_i,t_{i-1}), \\ {\varvec{v}}(t_{i})&= {\varvec{v}}_i^+ + d_{{{\varvec{u}}}}({\varvec{y}}_i,{\varvec{g}}(t_i, \varvec{\phi })). \end{aligned} \end{aligned}$$
(32)

This is a simple linear ODE with a continuous coefficient \(J^t_{{{\varvec{u}}}}({\varvec{f}}),\) since \(f\in {\varvec{C}}^1\). The classical results yield the global solution on \((t_i,t_{i-1})\) [see e.g. Theorem 5.1 and Theorem 5.2 from Coddington and Levinson (1955)]. This concludes the proof of the existence and uniqueness.

Now, we prove (14). Let us without a loss of generality assume that there are no measurements in times 0 and T. It is a well-known result of theory of distributions (in the sense of functional analysis), that the classical integration by part formula

$$\begin{aligned} \int _0^T d_{t}{\varvec{v}} {\varvec{w}}\ dt = [{\varvec{v}}{\varvec{w}}]_{0}^{T} -\int _0^T {\varvec{v}}d_{t}{\varvec{w}} \ dt \end{aligned}$$
(33)

is valid for \({\varvec{w}}\in {\varvec{C}}^1\) even if the derivative \(d_{t}{\varvec{v}}\) exists on [0, T] only in a weak sense, i.e. almost everywhere. Actually, (33) is the definition of the weak derivative of \({\varvec{v}}\) taking only \({\varvec{w}}\in {\varvec{C}}^1_0([0,T]).\) Consequently, since \({\varvec{s}}\in {\varvec{C}}^1([0,T]),\) we can safely proceed as follows

$$\begin{aligned} \begin{aligned} \left<{\varvec{s}},\{\delta \}d_{{{\varvec{u}}}}({\varvec{y}},{\varvec{g}}(t, \varvec{\phi }))\right>&\mathop {=}\limits ^{(15)} \left<{\varvec{s}},d_{t}{\varvec{v}} + J^t_{{{\varvec{u}}}}({\varvec{f}}){\varvec{v}}\right>\\&\mathop {=}\limits ^{(8)} -{\varvec{v}}^t(0)J_{\varvec{\phi }}({{\varvec{u}}}_0){\varvec{h}}-\left( d_{t}{\varvec{s}} - J_{{{\varvec{u}}}}({\varvec{f}}){\varvec{s}},{\varvec{v}}\right) \\&\mathop {=}\limits ^{(8)} -{\varvec{v}}^t(0)J_{\varvec{\phi }}({{\varvec{u}}}_0){\varvec{h}}-\left( J_{\varvec{\phi }}({\varvec{f}}){{\varvec{h}}},{\varvec{v}}\right) . \end{aligned} \end{aligned}$$
(34)

\(\square \)

1.4 Proof of Lemma 2

The existence and uniqueness of \({\varvec{\varsigma }}\) is a direct results of Theorem 1. Now, (20) is derived as follows:

$$\begin{aligned} \begin{aligned} \left<{\varvec{\varsigma }},\{\delta \}d_{{{\varvec{u}}}}({\varvec{y}},{\varvec{g}}(t, \varvec{\phi }))\right>&\mathop {=}\limits ^{(15)} \left<{\varvec{\varsigma }},d_{t}{\varvec{v}} + J^t_{{{\varvec{u}}}}({\varvec{f}}){\varvec{v}}\right>\\&\mathop {=}\limits ^{(15), (18)} -{({{\varvec{u}}}_0)}_{\varvec{\phi }\varvec{\phi }}{\varvec{h}}_1{\varvec{h}}_2\cdot {\varvec{v}}(0)\\&\quad \quad -\left( d_{t}{\varvec{\varsigma }},{\varvec{v}}\right) +\left( J_{{{\varvec{u}}}}({\varvec{f}}){\varvec{\varsigma }}, {\varvec{v}}\right) . \end{aligned} \end{aligned}$$
(35)

This after substituting for \(J_{{{\varvec{u}}}}({\varvec{f}}){\varvec{\varsigma }}\) from (18) directly yields (20). Analogically to the proof of Theorem 2, we needed \({\varvec{\varsigma }}\in {\varvec{C}}^1([0,T])\) to be able to integrate by parts.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melicher, V., Haber, T. & Vanroose, W. Fast derivatives of likelihood functionals for ODE based models using adjoint-state method. Comput Stat 32, 1621–1643 (2017). https://doi.org/10.1007/s00180-017-0765-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0765-8

Keywords

Navigation