Covid-19 Modelling A set of operational parameters for the SEIR model of the coronavirus for the world and some of the more heavily affected countries

## Abstract

In this paper the unknown operational parameters of the SEIR model for the current coronavirus pandemic for the world, as a whole, and some of the more heavily affected countries, individually, are presented. Simulated annealing was used to derive and optimize the operational parameters through comparison with recorded regional data.

## Introduction

A simple SEIR[1][2] epidemic model is considered for the simulation of the dynamics of the current coronavirus. In this model individuals are assigned to one of the following disease states: susceptible (S), exposed (E), infectious (I) or recovered (R). The aggregate numbers of individuals in these states are related in the model through a set of coupled first order differential equations.

The use of the simulated annealing technique to determine the operational parameters for the evolution of the current coronavirus was explored in an earlier paper by this author[3].

This paper presents the results of the use of this technique for an expanded set of countries, it also relies on improved algorithms used in the simulated annealing algorithm in order to derive the parameter sets for each region.

## The SEIR Model

The model, with a constant population is:

$\frac{dS}{dt}=-\beta SI$

$\frac{dE}{dt}=\beta SI- \sigma E$

$\frac{dI}{dt}=\sigma E-\gamma I$

$\frac{dR}{dt}=\gamma I$

where:

$\beta$: the normalized rate at which an infected contact results in a new exposure;

$\sigma$: the rate at which an exposed person becomes infected, and;

$\gamma$: the rate an infected person recovers and moves into the resistant phase or dies.

Of note is that the model is conservative in that if $S_0$ represents the initial susceptible population then:

$S_0=S+E+I+R,$

$dS_0=dS+dE+dI+dR=0,$

consequently, all susceptible individuals will transition through all of the other disease states.

An alternate form, in parameter names only, is also presented in the literature[4]:

$\frac{dS}{dt}=-\frac{R}{S_0T_{inf}} SI$

$\frac{dE}{dt}=\frac{R}{S_0T_{inf}}SI-\frac{E}{T_{inc}}$

$\frac{dI}{dt}=\frac{E}{T_{inc}} -\frac{I}{T_{inf}}$

$\frac{dR}{dt}=\frac{I}{T_{inf}}$

where:

R: the basic reproduction rate of the virus i.e. the number of secondary infections each infected individual creates, with;

Transmission Times:

$T_{inc}$: the length of incubation period, and;

$T_{inf}$: the duration for which the patient is infectious.

With respect to the currently available data sets, what is known with some degree of accuracy are the number of infectious cases, which for the purpose of this paper are taken to be the number of reported active cases, and the number of those cases that have been closed, either through being cured or the patient dying. In the SEIR model there is no distinction between the patient being cured or dying, both are viewed as recovered.

In the current literature the term active is alternately used for infectious and the term closed is an alternate for recovered (cured or dead).

The available data for active (infectious) cases and those that resulted in being cured or dying from worldometers.com [5], provided case data over the, almost, full history of the coronavirus pandemic. This source provides data for active (infectious) cases and closed (recovered) cases, which is the simple sum of cured and dead cases, on a daily and regional basis and does so uniformly across countries.

## Methodology

By evolving the SEIR model using a 4th order Runge-Kutta predictor corrector scheme [6], a comparison was made between the resulting evolution of the SEIR curves over time and the reported data.

The initial conditions were taken as an initial guess at the susceptible population, $S_0$, with the initial exposed population $E_0$ being the sum of reported number of infected cases $I_0$ and the recovered cases $R_0$ for that day and the initial number of infected cases and recovered cases being taken from the reported data. With each iteration of the model the previously derived initial value for the susceptible population was used.

An error measure for the comparison was the square root of the sum of the squares of the root mean square differences of the data points for active cases versus the model's active (infectious) data and the root mean square of the data points for the closed cases versus the model's recovered data:

$\epsilon= \sqrt{\frac{1}{N}\Biggl(\sum_{(k=1)}^N (A_k-a_k )^2+\sum_{(k=1)}^N(C_k-c_k )^2 \Biggl)}$

where:

$\epsilon$: is the error measure;

$A_k$ and $a_k$: are the empirical data point and the equivalent model data point for time step k (day k) for the active (infectious) cases;

$C_k$ and $c_k$: are the empirical data point and the equivalent model data point for time step k (day k) for the closed (recovered) cases;

N: is the number of time steps (days) in the empirical data set.

Using this simulated annealing algorithm, the parameter space of $S_0$, the initial size of the susceptible population, $\beta$, $\sigma$ and $\gamma$ was explored, stochastically, accepting unconditionally those parameter sets that yielded a lower error than the previous error and accepting conditionally, with diminishing likelihood, parameter sets that yielded a higher error than the previous set.

The simulated annealing algorithm searches were run for several regional data sets; the world, the United States of America, Sweden, South Korea as initial examples and then, once the process was understood, a larger set of other countries.

## Discussion

Finding those operational parameters that fit the SEIR model to the data, allows for the extrapolation of the data into the future. Also, and perhaps more importantly, it allows for the estimation of the size of the population, the $S_0$ parameter, that is going to be affected to the level of being considered an active case and then by correlation the number of those cases that will be cured and the number that will die.

Using, as input data, the known active cases and the known closed cases, to within the accuracy of the reporting of the various health authorities, means that the results derived for the number of susceptible cases and the number of exposed cases is limited to those in either group in the population that will end up as reported active cases. This explicitly excludes from the modelling those individuals that might be susceptible and hence exposed but would suffer only mild effects for which external medical care was not required and thus not reported.

This narrowing of scope is both beneficial and lacking. Beneficial: in that it explicitly filters out those cases that would have an impact on the medical system and that the numbers presented are those that policy makers need to be concerned with when looking to project estimates for the number of hospital beds or respirators that will be needed and when. Lacking: in that it does not provide for the full scope of the epidemic i.e. it will not provide estimates for the numbers of individuals that are susceptible and exposed that will suffer only mild symptoms.

The convergence of the simulated annealing algorithm occurred in relatively short order, however, the parameter space in certain instances gave the appearance of being quite flat. Consequently, the algorithm would wander somewhat aimlessly for many iterations. It also appeared from watching the output of the algorithm - updated, modelled, curves in relation to the recorded data - that there were blind alleys that the algorithm could get trapped in if the perturbations to the parameters used in the simulated annealing process were not sufficiently large.

A few alternate error measures for the simulated annealing algorithm were tested to get an understanding of whether or not improvements in the convergence could be found. This exercise was specifically done with the data for New Zealand, which, as depicted in the Results section, failed to converge as well as some of the other countries' models e.g. the US, France, etc. One change was to limit the error measure to the newest 20 data points. Since the initial values were set, in essence, pinning the data at the beginning, it was thought that matching on the newest data points would force the convergence of the SEIR model to the recorded data more effectively than if intervening errors were included. Another change was to use an error measure based on trying to minimize the difference between the slope at the newest data points for the active and closed curves to the SEIR model's active and closed curve's slopes at those points in time.

Neither of these alternate measures gave, qualitatively, better results than the error measure denoted.

Determining the slope of the reported active and closed data created issues of its own due to the data's fluctuations from day to day. The slope of the exponentially moving average, weighted by 1:5, was also tried but this caused its own distortions making this variant of error metric difficult to use.

As will be seen in the Results section different countries report data with varying degrees of timeliness and, maybe, consistency. Some countries, Brazil and Ireland for instance, present data for which it is difficult to match the SEIR model to, although some degree of matching is achieved, but with relatively large error. Others regions with better quality data, converged readily.

It also appears that for countries with smaller sample sizes, Korea, Iceland for example,  obtaining convergence is less attainable than for countries with larger sample sizes, the US, France, Spain etc.

The modelling indicates a large variation in the behaviour of the exposed curve from country to country. Countries such as South Korea, Germany and Austria exhibited very brief exposure curves, whereas the Sweden and Peru are at the opposite end of the scale with very extended exposure periods. This could be a function of the differences in the lock-down policies imposed. With severe lock-down policies having the effect of limiting the likelihood of exposure.

A related effect that this long exposure period has is that the associate infection curve is commensurately flattened, but extended in time.

A final feature of this analysis is that as new, daily, data is obtained each of the models can be updated and the corresponding operational parameters derived. The evolution of those parameters over time could be used to determine if the model for a country is static or not. That is, the gradient of the operational parameters can be used as a measure of whether the country's model is evolving over time and if that gradient is significant, in its nature and scale, it could be used as a signal for policy makers. This could be especially true if there are significant changes in the initial value of the susceptible population.  Conversely, if the gradient is negligible, it could be inferred that the model is stable over time and in conjunction with the error metric could provide a measure of reliability of the model.

## Results

Due to the use of simulated annealing to determine the operational parameters for the various models presented it should be clear that the results are approximate and subject to variations.

For the following figures, the curves in magenta are the empirical data for the number of active (infectious) cases (I), rendered as a solid line, and the number of closed (recovered) cases (R), rendered as a dashed line. These data curves are noted in the figure legends by "active emp." and "closed emp." respectively.

The remaining curves are derived from the SEIR model as evolved using the operational parameters detailed below each of the figures. The number of susceptible cases (S) is rendered in black as a dashed line. The number of exposed cases (E) is rendered in blue as a dashed line. The number of active (infectious) case (I) is rendered in red, also lightly dashed. The number of closed (recovered) cases (R) is rendered in green.

 Date Time $S_0$ $\beta$ $\sigma$ $\gamma$ $R$ $T_{inc}$ $T_{inf}$ $\epsilon$ $\epsilon/S_0$ % Australia 2020-07-09 08:00 7.20e+03 5.23e-05 1.94e-01 4.61e-02 8.17 5.14 21.67 6.43e+02 8.93e-02 8.9 Austria 2020-07-09 08:00 1.68e+04 1.91e-05 9.39e+00 5.44e-02 5.88 0.11 18.38 1.08e+03 6.46e-02 6.5 Belarus 2020-07-09 08:01 7.62e+04 1.86e-05 2.59e-02 3.42e-02 41.38 38.63 29.21 2.68e+03 3.52e-02 3.5 Belgium 2020-07-09 08:01 9.59e+04 1.76e-04 1.09e-02 1.22e-02 1385.15 91.61 82.09 5.63e+03 5.86e-02 5.9 Brazil 2020-07-09 08:01 1.29e+08 1.16e-06 2.22e-04 5.20e-02 2866.53 4499.78 19.22 4.36e+04 3.38e-04 0.0 Canada 2020-07-09 08:02 1.64e+05 2.35e-05 1.11e-02 3.09e-02 124.37 90.25 32.33 3.36e+03 2.05e-02 2.0 Chile 2020-07-09 08:02 7.17e+05 5.09e-06 1.14e-02 1.12e-01 32.57 87.73 8.92 1.45e+04 2.03e-02 2.0 China 2020-07-09 08:02 8.36e+04 6.36e-06 3.55e-01 4.79e-02 11.09 2.82 20.86 5.50e+03 6.58e-02 6.6 Ecuador 2020-07-09 08:02 7.85e+04 5.25e-05 1.43e-02 2.04e-02 202.07 69.85 48.99 4.55e+03 5.79e-02 5.8 France 2020-07-09 08:02 2.23e+05 5.27e-05 1.22e-02 2.27e-02 517.18 81.81 44.03 1.50e+04 6.75e-02 6.8 Germany 2020-07-09 08:02 1.87e+05 6.79e-06 6.25e-02 6.39e-02 19.89 16.00 15.64 6.04e+03 3.22e-02 3.2 Iceland 2020-07-09 08:03 1.86e+03 2.96e-04 3.35e-01 6.14e-02 8.97 2.99 16.29 1.32e+02 7.07e-02 7.1 Iran 2020-07-09 08:03 6.31e+06 3.37e-04 3.08e-04 7.86e-02 27028.84 3248.80 12.71 7.91e+03 1.25e-03 0.1 Ireland 2020-07-09 08:03 2.53e+04 1.13e-05 3.86e+00 6.43e-02 4.46 0.26 15.56 2.11e+03 8.33e-02 8.3 Italy 2020-07-09 08:04 2.48e+05 9.94e-06 4.33e-02 3.17e-02 77.63 23.12 31.53 9.42e+03 3.80e-02 3.8 Japan 2020-07-09 08:04 1.90e+04 1.18e-05 1.21e-01 3.54e-02 6.38 8.25 28.26 2.41e+03 1.26e-01 12.6 Korea 2020-07-09 08:04 1.13e+04 1.72e-04 1.16e-01 3.22e-02 60.41 8.61 31.10 8.76e+02 7.75e-02 7.8 Mexico 2020-07-09 08:04 1.42e+06 4.70e-05 8.68e-04 1.49e-01 449.28 1152.67 6.71 1.17e+03 8.20e-04 0.1 Netherlands 2020-07-09 08:04 5.14e+04 3.86e-03 3.55e-02 1.45e+01 13.73 28.18 0.07 8.28e+02 1.61e-02 1.6 New_Zealand 2020-07-09 08:04 1.54e+03 2.94e-04 3.15e-01 5.85e-02 7.73 3.18 17.09 9.68e+01 6.29e-02 6.3 Pakistan 2020-07-09 08:04 2.04e+06 4.38e-06 2.32e-03 3.31e-02 269.83 431.45 30.22 7.82e+03 3.84e-03 0.4 Peru 2020-07-09 08:05 1.95e+06 1.01e-05 2.45e-03 3.73e-02 529.80 408.65 26.82 6.79e+03 3.48e-03 0.3 Portugal 2020-07-09 08:05 3.85e+04 5.47e-05 4.61e-02 1.74e-02 121.15 21.70 57.48 4.62e+03 1.20e-01 12.0 Qatar 2020-07-09 08:05 1.00e+05 6.63e-06 6.10e-02 5.92e-02 11.23 16.40 16.89 6.61e+03 6.59e-02 6.6 Saudi_Arabia 2020-07-09 08:05 2.28e+06 1.58e-05 1.21e-03 5.57e-02 647.84 826.15 17.94 8.68e+03 3.81e-03 0.4 Singapore 2020-07-09 08:05 4.52e+04 4.84e-06 8.37e-02 3.43e-02 6.37 11.94 29.12 4.02e+03 8.89e-02 8.9 South_Africa 2020-07-09 08:05 3.67e+06 3.18e-06 1.84e-03 5.97e-02 195.25 544.44 16.75 9.23e+03 2.52e-03 0.3 Spain 2020-07-09 08:06 3.01e+05 2.58e-05 2.78e-02 3.78e-02 205.10 36.03 26.46 2.76e+04 9.17e-02 9.2 Sweden 2020-07-09 08:06 7.21e+06 4.20e-05 1.04e-04 6.85e-03 44254.68 9635.60 146.06 4.79e+03 6.64e-04 0.1 Switzerland 2020-07-09 08:06 3.12e+04 6.11e-05 9.78e-02 6.16e-02 30.98 10.22 16.24 1.48e+03 4.73e-02 4.7 Taiwan 2020-07-09 08:06 4.62e+02 4.90e-04 8.56e-02 3.21e-02 7.05 11.69 31.16 6.55e+01 1.42e-01 14.2 Turkey 2020-07-09 08:06 1.93e+05 2.70e-05 3.91e-02 4.76e-02 109.60 25.57 21.00 1.14e+04 5.89e-02 5.9 US 2020-07-09 08:06 1.05e+07 2.06e-06 2.99e-03 1.50e-02 1443.66 334.16 66.83 7.35e+04 7.01e-03 0.7 World 2020-07-09 08:07 1.44e+08 1.32e-07 7.98e-04 2.65e-02 717.59 1253.47 37.77 2.82e+05 1.97e-03 0.2

## Conclusion

The use of simulated annealing to derive the size of the susceptible population for various jurisdictions and the other operational parameters for the SEIR model appears to be a useful tool to determine these values as they relate to a specific region or to the global population.

It is clear that the values determined for the reproduction rate, incubation period and the infectious period lie outside of the ranges normally expected from the literature. This is likely due to the way in which the SEIR model is used, that is, by starting with data for the number of recorded infected patients and the number of recorded closed cases as opposed to addressing the full population. Running the model this way infers the susceptible pool from the infected pool and, as such, there is an expectation that R would be larger than otherwise and that the incubation and infectious periods longer.

The correlation between the recorded data and the model, with the exception of that for South Korea and other countries that have a decreasing infectious population in the long term and other countries with poor data, is relatively good, with an approximately 1\% or better relative error.

Although the majority of models approximate the data well, it is clear that:

• better models are needed to capture the behaviour of epidemic evolution more accurately, especially for longer term behaviour, and
• alternate matching criteria could be use to gain a better fit of the operational parameters to the data.

However the, demonstrated, correlation between the recorded data and the SEIR model, given the derived operational parameters, could prove to provide a useful measure, over time, to allow forward looking decisions to be made regarding the state of the pandemic globally and regionally. The parameter sets detailed also provide a useful starting point for other researchers to use as a baseline.

## References

[1] W. O. Kermack, A. G. McKendrick, A contribution to the mathematical theory of epidemics, Proceedings of the Royal Society A. 115 (1927) 700-721. doi:10.1098/rspa.1927.0118.

[2] H. Hethcote, The mathematics of infectious diseases, SIAM Review 42 (2000) 599-653. doi:10.1137/s0036144500371907.

[3] L. Mulder, Use of simulated annealing to determine the operational parameters of the SEIR model for the coronavirus for various jurisdictions, Bull. World Health Organ. E-pub 20 April 2020, [Submitted]. (2020) doi:http://dx.doi.org/10.2471/BLT.20.260513.

[4] G. Goh, Epidemic calculator (2020) URL http://gabgoh.github.io/COVID/index.html

[5] Worldometers.info (2020) URL https://www.worldometers.info/coronavirus/

[6] M. Abramowitz, I. Stegun, Handbook of Mathematical Functions, United States Department of Commerce, National Bureau of Standards (NBS),1964.