Appendix
Model
The idea behind the model introduced in this paper is related to the SIR model with time delay. Since we are going to implement it on available daily data, a discrete version of this model is considered with the step-size one day. Moreover, since the “recovered” population is not our focus, we will only consider equations related to susceptible (S) and infectious (I) individuals. The most commonly used SIR model1 in the literature is provided below (see, for example,8):
Here λ is the recruitment of the population; μ is the natural death rate of the population; α is the death rate due to disease; γ is the recovery rate; and τ1 is the latent period that infected individual becomes infectious.
The fraction (1–μ)τ1 represents the survival rate of population over the period of [0, τ1] (in continuous-time case it is equivalent to e–μτ1). Below we examine this model in detail and develop an improved model.
Susceptible individuals. Equation (3) describes the dynamics of susceptible individuals S(t). This equation “keeps” the number of infectious individuals I(t) bounded. For example, when the basic reproduction number is greater than 1, there exists8 an endemic equilibrium (S*, I*) and S(t) → S*, I(t) → I* as t → ∞. In the case when the birth rate is zero (λ = 0) the relation I(t) → 0 suggests that S(t) → 0.
Thus according to this model the epidemic ends because the number of susceptible individuals S(t) decreases over time and the effective reproduction number (as a function of time) becomes less than 1 at some stage; that is, the number of newly infected population F(S(t), I(t)) decreases thanks to the “enough” decrease in the number of susceptible individuals (while I(t) still increases). This might be applicable to epidemics in early 1900s but it is definitely not applicable to recent ones.
This issue significantly restricts the application of the SIR model for the study of the current Ebola virus epidemic. Below we consider 3 possibilities to overcome this difficulty.
1. The simplest way would be to use a “relatively small” number S(0) for a possible number of susceptible individuals that may become infected. This approach has been implemented in1 where the total population size in each country (Guinea, Sierra Leone and Liberia) was assumed to be 106 individuals.
2. An interesting (and most reliable in our opinion) approach would be considering “relatively small” number of population S(0) as a variable that needs to be estimated. We have implemented this approach and the results show that currently available curve/data is not “long” enough to uniquely determine S(0); that is, almost the same quality of data fit can be achieved for different numbers S(0) (we have tried 50,000, 100,000 and 200,000) leading to different numbers of “stabilized” cumulative infected cases and infection periods. Taking this factor into account, we do not consider this approach, however we note that it might be quite possible soon with the availability of more data points.
3. In this paper we adopt another approach by neglecting the compartment S completely and leaving just the compartment I. The force of infection F(S, I) in this case is the main factor to be determined. We take this function in the form
F(S, I) = β𝒟I (5)
where 𝒟 is the population density of a particular country. In a more general setting, one would involve functions nonlinear in I (like F(S, I) = β𝒟Iξ with ξ ≤ 1). However, since the infectious population I is a very small portion of the total population, function F can be assumed linear at least in early stages of epidemics. In this case equation (4) can be represented in the form
I(t + 1) = (1 – μ)τ1 β𝒟I(t – τ1) + (1– μ – α – γ)I(t). (6)
The major drawback of this model is that I may growth infinitely if the reproduction number is greater than 1; in this model there is no variable/parameter (like S(t) in SIR) that could force I to decrease. On the other hand we believe that it can better describe the behavior of an infected population in “small” time intervals and provide more accurate reproduction numbers.
Active infectious population. Now we discuss the infectious population and equation (6) in more detail. We call “active infectious populations” at time t the infected population that are infectious at that time but are not hospitalized yet. Denote by Ia(t) the number of active infectious populations at time t. We will rewrite equation (6) in terms of Ia.
Denote by τ2 the average infectious period; that is, time from onset (τ1) to hospitalization. Then, an infected person is assumed to be active infectious during the period [τ1, τ1 + τ2]. Since τ2 is relatively small, we can assume that none is recovering during that period. This means that the rate of recovery γ in (6) is no longer needed for Ia(t).
Thus, we transform equation (6) by taking into account the time delay τ2. Accordingly, the equation for Ia(t) can be represented in the form
Here ω(0) = 0 and ω(i), i ≥ 1, is a gamma cumulative distribution function for onset-to-death that well describes the current Ebola virus in West Africa7. We note that in this equation, for each i ≥ 1, the fraction (1–αω(i)) is applied to the remaining infectious (1–μ)i β𝒟Ia(t–τ1–i); that is, the death rate in (7) is slightly different from (6) (indeed, both μ and ω(i) are quite small and this leads to 1–μ–αω(i) ≈ (1–μ)(1–αω(i))).
Cumulative number of infected cases. The first term (1 – μ)τ1 β𝒟 Ia(t – τ1) in (7) describes the number of new cases at time t. The cumulative number of infectious cases at (t+1) will be denoted by C(t +1). It can be calculated as
Cumulative number of deaths. To calculate the cumulative number of deaths at time t, we consider all infectious cases (hospitalized or not) in the interval [t–τ1, t–n] where n is a sufficiently large number. In particular we assume that death may occur after the onset. As mentioned above, the distribution of death is described by a gamma distribution function ω with its p.d.f - ωp. Then, the cumulative number of deaths due to disease can be calculated as
Data fitting: Optimization Problems
Main parameters. We have formulated the dynamical system (7),(8),(9). Given the observed cumulative number of infected cases - C0(t) and cumulative number of death cases - D0(t), the parameters of the systems can be estimated by the best fit. Before formulating this problem we discuss the parameters to be estimated.
The density 𝒟 and the natural death rate of the population - μ is available for each country. We set 𝒟 = 41, 80, 36 and 50 for Guinea, Sierra Leone, Liberia and the worlwide data, respectively. The natural death rate is around 10 deaths for 1000 population per year (1 percent yearly) for all the three countries. Thus in all numerical implementations, the daily rate μ is set to be 0.01/365 = 0.0000274. It is reasonable to have the same average latency period - τ1 for infected individual to become infectious. The previous studies (e.g. [7]) suggest that it is between 2–21 days with the mean of 11.4 days. Our numerical experiments show that the values between 6–8 provide better results; we set τ1 = 6 in all cases.
Parameters of the gamma distribution can be taken from7. We set
with mean value 7.5. Note that the choice of values a and b within reasonable intervals, by keeping the mean value the same, has almost no effect on the quality of data fitting. Taking into account this fact, the parameters of the gamma distribution are chosen as in (10). In all the calculations, we set n=35 (days) in (9) (note that for large i function values ω(i) are almost zero).
Initial values Ia(t), t ≤ 1, for the equation are chosen in the form
Ia(t) = ξC0(1), for all t ≤ 1. (11)
where C0(1) is the actual cumulative infectious. Numerical experiments show that the choice of ξ in the interval 0.4–0.7 has very little impact on the quality of data fitting. We set ξ = 0.4 in all cases exept Liberia for which the value 0.7 was better. Accordingly, we do not consider ξ as a variable and set the above mentioned values for each country/data.
Therefore, the main parameters that define the dynamics of Ebola epidemics in different countries are α - the death rate due to disease, β - the coefficient of the force of infection and τ2 - the average infectious period.
Data fitting. We consider data collected till 11 November 2014 for the cumulative number of infectious (confirmed, probable and suspected) and death individuals; they will be denoted by C0(t) and D0(t), respectively. We will use the root mean square error. Given time interval [T1, T2] and data points C0(ti) and D0(ti), i ≥ 1, we define
According to this formula, we fit the second half of given data in order to decrease the choice of initial values Ia(t), t ≤ 1, defined by (11).
Basic reproduction number
R0To calculate the basic reproduction number, the above model is considered on the whole interval. The corresponding data fitting problem is:
Problem (DF1): Given data C0(ti) and D0(ti), i ≥ 1, and time interval [1, T2]:
The reproduction numbers
Rk,
k = 1, 2, 3 for different time sections
The reproduction number is mainly determined by β and τ2. Since in our model parameter τ2 takes discrete values (days) it would be interesting to study the change of this parameter over time while keeping β the same for a whole period.
We consider three consequent time intervals Δk = [tk, tk+1] (k = 1, 2, 3) for each country and find optimal values α, β and (k = 1, 2, 3). The last time point t4 is T2 = 11-Nov-2014. Corresponding data fitting problem is
Problem (DF2): Given data C0(ti) and D0(ti), i ≥ 1, and time interval [t1, t4]:
1Continuous time version of this model is
Comments on this article Comments (0)