Abstract
We consider time series data modeled by ordinary differential equations (ODEs), widespread models in physics, chemistry, biology and science in general. The sensitivity analysis of such dynamical systems usually requires calculation of various derivatives with respect to the model parameters. We employ the adjoint state method (ASM) for efficient computation of the first and the second derivatives of likelihood functionals constrained by ODEs with respect to the parameters of the underlying ODE model. Essentially, the gradient can be computed with a cost (measured by model evaluations) that is independent of the number of the ODE model parameters and the Hessian with a linear cost in the number of the parameters instead of the quadratic one. The sensitivity analysis becomes feasible even if the parametric space is high-dimensional. The main contributions are derivation and rigorous analysis of the ASM in the statistical context, when the discrete data are coupled with the continuous ODE model. Further, we present a highly optimized implementation of the results and its benchmarks on a number of problems. The results are directly applicable in (e.g.) maximum-likelihood estimation or Bayesian sampling of ODE based statistical models, allowing for faster, more stable estimation of parameters of the underlying ODE model.
Similar content being viewed by others
Notes
Considering spatial phenomena such as diffusion and(or) convection leads to PDE models, see for example (Slodička and Balážová 2010).
Other continuous “interpolation” are possible such as piecewise-linear or by cubic splines, but they are less graphic.
The variance \(\sigma ^2\) has here a purely ad hoc use for the above argument. We are not interested if it is prescribed or estimated from the data.
The diagonal system (24) is the most sparse system one can think of.
An alternative argumentation could employ equivalence (9).
References
Bazaraa MS, Sherali HD, Shetty CM (2006) Nonlinear programming: theory and algorithms, 3rd edn. John Wiley & Sons, Hoboken
Bensoussan A, Lions J, Papanicolaou G (1978) Asymptotic analysis for periodic structures. North-Holland Pub. Co, Amsterdam
Bertsekas DP (1999) Nonlinear programming. Athena scientific, Belmont
Brooks S, Gelman A, Jones G, Meng XL (2011) Handbook of markov chain monte carlo. CRC Press, Boca Raton
Cesari L (1983) Optimization-theory and applications: problems with ordinary differential equations, applications of mathematics, vol 17. Springer-Verlag, New York
Cimrák I, Melicher V (2007) Sensitivity analysis framework for micromagnetism with application to the optimal shape design of magnetic random access memories. Inverse Probl 23(2):563–588
Coddington EA, Levinson N (1955) Theory of ordinary differential equations. Tata McGraw-Hill Education, New York
Delyon B, Lavielle M, Moulines E (1999) Convergence of a stochastic approximation version of the EM algorithm. Ann Stat 27(1):94–128
Draelants D, Broeckhove J, Beemster GTS, Vanroose W (2012) Numerical bifurcation analysis of the pattern formation in a cell based auxin transport model. J Math Biol 67(5):1279–1305
Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361
Haber T, Melicher V, Michiels N, Kovac T, Nemeth B, Claes J (2016) DiffMEM. https://bitbucket.org/tomhaber/diffmem/branch/analysis
Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, Woodward CS (2005) SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM Trans Math Softw 31(3):363–396
Knoll D, Keyes D (2004) Jacobian-free Newton–Krylov methods: a survey of approaches and applications. J Comput Phys 193(2):357–397
Lavielle M, Samson A, Karina Fermin A, Mentré F (2011) Maximum likelihood estimation of long-term hiv dynamic models and antiviral response. Biometrics 67(1):250–259
Lewis JM, Lakshmivarahan S, Dhall S (2006) Dynamic data assimilation: a least squares approach, encyclopedia of mathematics and its applications, vol 13. Cambridge University Press, Cambridge
Lindstrom MJ, Bates DM (1990) Nonlinear mixed effects models for repeated measures data. Biometrics 46(3):673–687
Lions JL (1971) Optimal control of systems governed by partial differential equations. Springer, Berlin
Martin J, Wilcox L, Burstedde C, Ghattas O (2012) A stochastic Newton MCMC method for large-scale statistical inverse problems with application to seismic inversion. SIAM J Sci Comput 34(3):A1460–A1487
Melicher V, Vrábel’ V (2013) On a continuation approach in Tikhonov regularization and its application in piecewise-constant parameter identification. Inverse Probl 29(11):115,008
Moré JJ (1978) The Levenberg–Marquardt algorithm: implementation and theory. In: Watson G (ed) Numerical analysis, lecture notes in mathematics, vol 630, Springer, Berlin, pp 105–116
Murray JD (2002) Mathematical biology I: an introduction, interdisciplinary applied mathematics, vol 17, 3rd edn. Springer-Verlag, New York
Raue A, Schilling M, Bachmann J, Matteson A, Schelke M, Kaschek D, Hug S, Kreutz C, Harms BD, Theis FJ, Klingmüller U, Timmer J (2013) Lessons learned from quantitative dynamical modeling in systems biology. Plos One 8(9):1–17
Serban R, Hindmarsh AC (2005) CVODES: the sensitivity-enabled ODE solver in SUNDIALS. In: ASME 2005 international design engineering technical conferences and computers and information in engineering conference, American Society of Mechanical Engineers, pp 257–269
Slodička M, Balážová A (2010) Decomposition method for solving multi-species reactive transport problems coupled with first-order kinetics applicable to a chain with identical reaction rates. Journal of Computational and Applied Mathematics 234(4):1069–1077, proceedings of the Thirteenth International Congress on Computational and Applied Mathematics (ICCAM-2008), Ghent, Belgium, 7–11 July, 2008
Tornøe CW, Agersø H, Jonsson E, Madsen H, Nielsen HA (2004) Non-linear mixed-effects pharmacokinetic/pharmacodynamic modelling in NLME using differential equations. Comput Methods Programs Biomed 76(1):3–40
Wong R (2001) Asymptotic approximations of integrals, Classics in applied mathematics, vol 34. SIAM, Boston
Zeidler E (1985) Nonlinear functional analysis and its applications: fixed point theorems, nonlinear functional analysis and its applications. Springer-Verlag, New York
Acknowledgements
We would like to thank Xavier Woot de Trixhe from Janssen Pharmaceutica for numerous very interesting discussions on PK/PD, virology, biological pathways modeling, NLMEMs and on life in general. They were an important source of motivation and provided a view from a different perspective. And we would like to thank the anonymous reviewers for their substantial input, enhancing the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work of the first two authors was supported by IWT O&O Project 130406—ExaScience Life HPC.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: 1 Proofs
Appendix: 1 Proofs
1.1 Proof of Theorem 1
First, when compared with Theorem 4.D from Zeidler (1985) we work with \(X={\mathbb {R}}^m\) and \(P={\mathbb {R}}^p.\) Those are complete normed vector spaces i.e. they are Banach spaces. Second, the initial condition is dependent only on parameter \(\varvec{\phi }\), not on any free parameter y as in Theorem 4.D.
Set \(J = [-1, 1].\) Let us do the following rescaling: \(t = sa,\) \({\varvec{z}}(s) := {{\varvec{u}}}(as) - {{\varvec{u}}}_0(\varvec{\phi })\) for all \(s \in J.\) Then (1) is equivalent to
This can be written as an operator equation \(F({\varvec{z}}, a, \varvec{\phi }) = 0\) with the operator \(F:{\varvec{Z}}\times {\varvec{A}}\rightarrow {\varvec{W}}\) and spaces \({\varvec{Z}}= \{{\varvec{z}}\in {\varvec{C}}^1(J,{\mathbb {R}}^m): {\varvec{z}}(0) = \varvec{0}\}\), \({\varvec{W}}= C(J,{\mathbb {R}}^m).\) The space \({\varvec{A}}\) contains all the parameters \((a, \varvec{\phi })\), i.e. \({\varvec{A}}= {\mathbb {R}}\times {\mathbb {R}}^p.\)
Set \({\varvec{q}}= (\varvec{0}, 0, \varvec{\phi }).\) Both F and \(F_z\) are continuous at \({\varvec{q}}.\) Obviously, \(F({\varvec{q}})=\varvec{0}\) and \(F_{{\varvec{z}}}({\varvec{q}}){\varvec{z}}= {\varvec{z}}'.\) The crucial observation is that for every \({\varvec{w}}\in {\varvec{W}},\) there exists exactly one \({\varvec{z}}\in {\varvec{Z}}\) with \({\varvec{z}}'={\varvec{w}}\), namely \({\varvec{z}}(s)=\int _0^s {\varvec{w}}(t) \,\mathrm {d}t\). Hence \(F_{{\varvec{z}}}({\varvec{q}}):{\varvec{Z}}\rightarrow {\varvec{W}}\) is bijective. The implicit function theorem yields the conclusions [see e.g. Theorem 4.B in Zeidler (1985)]. \(\square \)
1.2 Proof of Lemma 1
After realizing that \(l\) depends on \(\varvec{\phi }\) only through \({{\varvec{u}}}\), (12) is formally a direct application of the chain rule (\(d_{{{\varvec{u}}}}\) denotes the derivative of the metric with respect to the model state \({{\varvec{u}}}\)).
The r.h.s. of (12) is a well-posed finite expression. First, we have assumed that the metric is sufficiently smooth, thus \(d_{{{\varvec{u}}}}\) is continuous. Second, \({\varvec{s}}\) is continuous as well owing to Theorem 1. A distribution can be rescaled by any at least continuous function, as here \(\{\delta \}\) by \(d_{{{\varvec{u}}}}\).
Thus, the first differential on the l.h.s of (12) exists as well. It is moreover continuous, i.e. it is Fréchet.Footnote 5 \(\square \)
1.3 Proof of Theorem 2
First, let us assume that we have already constructed a unique solution \({\varvec{v}}\) to (15) up to a certain measurement point \(t_i\). The adjoint problem is solved backwards in time. Consequently, we will construct its prolongation on \([t_i, t_{i-1})\).
Let \({\varvec{v}}_i^+\) be the ODE solution just before integrating the measurement at time \(t_i\), i.e. at time \(t_i^+.\) We simply stop the integration at \(t_i^+,\) add \(d_{{{\varvec{u}}}}({\varvec{y}}_i,{\varvec{g}}(t_i, \varvec{\phi }))\) to \({\varvec{v}}_i^+\) and solve
This is a simple linear ODE with a continuous coefficient \(J^t_{{{\varvec{u}}}}({\varvec{f}}),\) since \(f\in {\varvec{C}}^1\). The classical results yield the global solution on \((t_i,t_{i-1})\) [see e.g. Theorem 5.1 and Theorem 5.2 from Coddington and Levinson (1955)]. This concludes the proof of the existence and uniqueness.
Now, we prove (14). Let us without a loss of generality assume that there are no measurements in times 0 and T. It is a well-known result of theory of distributions (in the sense of functional analysis), that the classical integration by part formula
is valid for \({\varvec{w}}\in {\varvec{C}}^1\) even if the derivative \(d_{t}{\varvec{v}}\) exists on [0, T] only in a weak sense, i.e. almost everywhere. Actually, (33) is the definition of the weak derivative of \({\varvec{v}}\) taking only \({\varvec{w}}\in {\varvec{C}}^1_0([0,T]).\) Consequently, since \({\varvec{s}}\in {\varvec{C}}^1([0,T]),\) we can safely proceed as follows
\(\square \)
1.4 Proof of Lemma 2
The existence and uniqueness of \({\varvec{\varsigma }}\) is a direct results of Theorem 1. Now, (20) is derived as follows:
This after substituting for \(J_{{{\varvec{u}}}}({\varvec{f}}){\varvec{\varsigma }}\) from (18) directly yields (20). Analogically to the proof of Theorem 2, we needed \({\varvec{\varsigma }}\in {\varvec{C}}^1([0,T])\) to be able to integrate by parts.\(\square \)
Rights and permissions
About this article
Cite this article
Melicher, V., Haber, T. & Vanroose, W. Fast derivatives of likelihood functionals for ODE based models using adjoint-state method. Comput Stat 32, 1621–1643 (2017). https://doi.org/10.1007/s00180-017-0765-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0765-8