What is the most effective policy response to the new coronavirus pandemic?

Disclaimer: I am not an epidemiologist, but there is an interesting potentially important pattern in the data that seems worth understanding.

World healthcare authorities appear to be primarily shifting towards Social Distancing. However, there is potential to pursue a different strategy in the medium term that exploits a vulnerability of this disease: the 5 day incubation time is much longer than a 4 hour detection time. This vulnerability is real—it has proved exploitable at scale in South Korea and in China outside of Hubei.

Exploiting this vulnerability requires:

  1. A sufficient capacity of rapid tests be available. Sufficient here is perhaps 30 times the number of true new cases per day based on South Korea’s testing rate.
  2. The capacity to rapidly trace the contacts of confirmed positive cases. This is both highly labor intensive and absurdly cheap compared to shutting down the economy.
  3. Effective quarantining of positive and suspect cases. This could be in home, with the quarantine extended to the entire family. It could also be done in a hotel (… which are pretty empty these days), or in a hospital.

Where Test/Trace/Quarantine are working, the number of cases/day have declined empirically. Furthermore, this appears to be a radically superior strategy where it can be deployed. I’ll review the evidence, discuss the other strategies and their consequences, and then discuss what can be done.

Evidence for Test/Trace/Quarantine
The TTQ strategy works when it effectively catches a 1 – 1 / reproduction number fraction of cases. The reproduction number is not precisely known although discovering 90% of cases seems likely effective and 50% of cases seems likely ineffective based on public data.

How do you know what fraction of cases are detected? A crude measure can be formed by comparing detected cases / mortality across different countries. Anyone who dies from pneumonia these days should be tested for COVID-19 so the number of deaths is a relatively trustworthy statistic. If we suppose the ratio of true cases to mortality is fixed, then the ratio of observed cases to mortality allows us to estimate the fraction of detected cases. For example, if the true ratio between infections and fatalities is 100 while we observe 30, then the detection rate is 30%.

There are many caveats to this analysis (see below). Nevertheless, this ratio seems to provide real information which is useful in thinking about the future. Drawing data from the Johns Hopkins COVID-19 time series, and plotting we see:

The arrows here represent the progression of time by days with time starting at the first recorded death. The X axis here is the ratio between cumulative observed cases and cumulative observed deaths. Countries that are able and willing to test widely have progressions on the right while those that are unable or unwilling to test widely are on the left. Note here that the X axis is on a log scale allowing us to see small variations in the ratio when the ratio is small and large variations in the ratio when the ratio is large.

The Y axis here is the number of cases/day. For a country to engage in effective Test/Trace/Quarantine, it must effectively test, which the X axis is measuring. Intuitively, we expect countries that test effectively to follow up with Trace and Quarantine, and we expect this to result in a reduced number of cases per day. This is exactly what is observed. Note that we again use a log scale for the Y axis due to the enormous differences in numbers.

There are several things you can read from this graph that make sense when you consider the dynamics.

  1. China excluding Hubei and South Korea had outbreaks which did not exceed the hospital capacity since the arrows start moving up and then loop back down around a 1% fatality rate.
  2. The United States has a growing outbreak and a growing testing capacity. Comparing with China-excluding-Hubei and South Korea’s outbreak, only a 1/4-1/10th fraction of the cases are likely detected. Can the United States expand capacity fast enough to keep up with the growth of the epidemic?
  3. Looking at Italy, you can see evidence of an overwhelmed healthcare system as the fatality rate escalates. There is also some hope here, since the effects of the Italian lockdown are possibly starting to show in the new daily cases.
  4. Germany is a strange case with an extremely large ratio. It looks like there is evidence that Germany is starting to control their outbreak, which is hopeful and aligned with our expectations.

The creation of this graph is fully automated and it’s easy to graph things for any country in the Johns Hopkins dataset. I created a github repository with the code. Feel free to make fun of me for using C++ as a scripting language 🙂

You can also understand some of the limitations of this graph by thinking through the statistics and generation process.

  1. Mortality is a delayed statistic. Apparently, it’s about a week delayed in the case of COVID-19. Given this, you expect to see the ratio generate loops when an outbreak occurs and then is controlled. South Korea and China-excluding-Hubei show this looping structure, returning to a ratio of near 100.
  2. Mortality is a small statistic, and a small statistic in the denominator can make the ratio unstable. When mortality is relatively low, we expect to see quite a variation. Checking each progression, you see wide ratio variations initially, particularly in the case of the United States.
  3. Mortality may vary from population to population. It’s almost surely dependent on the age distribution and health characteristics of the population and possibly other factors as well. Germany’s ratio is notably large here.
  4. Mortality is not a fixed variable, but rather dependent on the quality of care. A reasonable approximation of this is that every “critical” case dies without intensive care support. Hence, we definitely do not expect this statistic to hold up when/where the healthcare system is overwhelmed, as it is in Italy. This is also the reason why I excluded Hubei from the China data.

Lockdown
The only other strategy known to work is a “lockdown” where nearly everyone stays home nearly all the time, as first used in Hubei. This characterization is simplistic—in practice such a quarantine comes with many other measures as well. This can work very effectively—today the number of new case in Hubei is in the 10s.

The lockdown approach shuts down the economy fast and hard. Most people can’t work, so they can’t make money, so they can’t buy things, so the people who make things can’t make money, so they go broke, etc… This is strongly reflected in the stock market’s reaction to the escalating pandemic. If the lockdown approach is used for long most people and companies are destined for bankruptcy. If a lockdown approach costs 50% of GDP then a Test/Trace/Quarantine approach costing only a few% of GDP seems incredibly cheap in comparison.

The lockdown approach is also extremely intrusive. It’s akin to collective punishment in that it harms the welfare of everyone, regardless of their disease status. Many peoples daily lives fundamentally depend on moving around—for example people using dialysis.

Despite this, the lockdown approach is being taken up everywhere that cases are overwhelming or threaten to overwhelm hospitals because the alternative (next) is even worse. One advantage that a lockdown approach has is that it can be used now while the Test/Trace/Quarantine approach requires more organizing. It’s the best bad option when the Test/Trace/Quarantine capacity is exceeded or to bridge the time until it becomes available.

If/when/where Test/Trace/Quarantine becomes available, I expect it to be rapidly adopted. This new study (page 11) points out that repeated lockdowns are close to permanent lockdowns in effect.

Herd Immunity
Some countries have considered skipping measures to control the virus on the theory that the population eventually acquires enough people with individual immunity after recovery so the disease dies out. This approach invites severe consequences.

A key issue here is: How bad is the virus? The mortality rate in China excluding Hubei and South Korea is only about 1%. From this, some people appear to erroneously reason that the impact of the virus is “only” having 1% of 50% of the population die, heavily weighted towards older people. This reasoning is fundamentally flawed.

The mortality rate is not a fixed number, but rather dependent on the quality of care. In particular, because most countries have very few intensive care units, an uncontrolled epidemic effectively implies all but a vanishing fraction of sick people only benefit from home stay quality of care. How many people could die with home stay quality of care? Essentially everyone who would otherwise require intensive care at a hospital. In China, that meant 6.1% (see page 12). Given this, the sound understanding is that COVID-19 generates a factor 2-3 worse mortality than the 1918 influenza pandemic where modern healthcare might make this instead be half as bad when not overwhelmed. Note here that the fatality rate in Hubei (4.6% of known cases, which might be 3% of total cases) does not fully express how bad this would be due to the fraction of infected people remaining low and a surge of healthcare support from the rest of China.

The herd immunity approach also does not cause the disease to die out—instead it continues to linger in the population for a long time. This means that people traveling from such a country will be effectively ostracized by every country (like China or South Korea) which has effectively implemented a Test/Trace/Quarantine approach.

I’ve avoided discussing the ethics here since people making this kind of argument may not care about ethics. For everyone else it’s fair to say that letting part of the population die to keep the economy going is anathema. My overall expectation is that governments pursuing this approach are at serious risk of revolt.

Vaccine

Vaccines are extremely attractive because they are a very low cost way to end the pandemic. They are however uncertain and take time to develop and test, so they are not a viable strategy for the next few months.

What can be done?

Public health authorities are generally talking about Social Distancing. This is plausibly the best general-public message because everyone can do something to help here.

It’s also clear that healthcare workers, vaccines makers, and everyone supporting them have a critical role to play.

But, perhaps there’s a third group that can really help? Perhaps there are people who can help scale up the Test/Trace/Quarantine approach so it can be rapidly adopted? Natural questions here are:

  1. How can testing be scaled up rapidly—more rapidly than the disease? This question is already getting quite a bit of attention, and deservedly so.
  2. How can tracing be scaled up rapidly and efficiently? Hiring many people who are freshly out of work is the most obvious solution. That could make good sense given the situation. However, automated or partially automated approaches have the potential to greatly assist as well. I hesitate to mention cell phone tracking because of the potential for abuse, but can that be avoided while still gaining the potential public health benefits?
  3. How can quarantining be made highly precise and effective? Can you estimate the risk of infection with high precision? What support can safely be put in place to help those who are quarantined? Can we avoid the situation where the government says “you should quarantine” and “people in quarantine can’t vote”?

Some countries started this pandemic setup for relatively quick scaleup of the Test/Trace/Quarantine. Others, including the United States, seem to have been unprepared. Nevertheless, I am still holding out hope that the worst case scenarios (high mortality or months-long lockdowns) can be largely avoided as the available evidence suggests that this is certainly possible. Can we manage to get the number of true cases down (via a short lockdown if necessary) to the point where an escalating Test/Trace/Quarantine approach can take over?

Edit: I found myself remaking the graph for myself personally so I made it update hourly and added New York (where I live).

Coronavirus and Machine Learning Conferences

I’ve been following the renamed COVID-19 epidemic closely since potential exponentials deserve that kind of attention.

The last few days have convinced me it’s a good idea to start making contingency plans for machine learning conferences like ICML. The plausible options happen to be structurally aligned with calls to enable reduced travel to machine learning conferences, but of course the need is much more immediate.

I’ll discuss relevant observations about COVID-19 and then the impact on machine learning conferences.

COVID-19 observations

  1. COVID-19 is capable of exponentiating with a base estimated at 2.13-3.11 and a doubling time around a week when unchecked.
  2. COVID-19 is far more deadly than the seasonal flu with estimates of a 2-3% fatality rate but also much milder than SARS or MERS. Indeed, part of what makes COVID-19 so significant is the fact that it is mild for many people leading to a lack of diagnosis, more spread, and ultimately more illness and death.
  3. COVID-19 can be controlled at a large scale via draconian travel restrictions. The number of new observed cases per day peaked about 2 weeks after China’s lockdown and has been declining for the last week.
  4. COVID-19 can be controlled at a small scale by careful contact tracing and isolation. There have been hundreds of cases spread across the world over the last month which have not created new uncontrolled outbreaks.
  5. New significant uncontrolled outbreaks in Italy, Iran, and South Korea have been revealed over the last few days. Some details:
    1. The 8 COVID-19 deaths in Iran suggests that the few reported cases (as of 2/23) are only the tip of the iceberg.
    2. The fact that South Korea and Italy can suddenly discover a large outbreak despite heavy news coverage suggests that it can really happen anywhere.
    3. These new outbreaks suggest that in a few days COVID-19 is likely to become a world-problem with a declining China aspect rather than a China-problem with ramifications for the rest of the world.

There remains quite a bit of uncertainty about COVID-19, of course. The plausible bet is that the known control measures remain effective when and where they can be exercised with new ones (like a vaccine) eventually reducing it to a non-problem.

Conferences
The plausible scenario leaves conferences still in a delicate position because they require many things go right to function. We can easily envision 3 quite different futures here consistent with the plausible case.

  1. Good case New COVID-19 outbreaks are systematically controlled via proven measures with the overall number of daily cases declining steadily as they are right now. The impact on conferences is marginal with lingering travel restrictions affecting some (<10%) potential attendees.
  2. Poor case Multiple COVID-19 outbreaks turn into a pandemic (=multi-continent epidemic) in regions unable to effectively exercise either control measure. Outbreaks in other regions occur, but they are effectively controlled. The impact on conferences is significant with many (50%?) avoiding travel due to either restrictions or uncertainty about restrictions.
  3. Bad case The same as (2), except that an outbreak occurs in the area of the conference. This makes the conference nonviable due to travel restrictions alone. It’s notable here that Italy’s new outbreak involves travel lockdowns a few hundred miles/kilometers from Vienna where ICML 2020 is planned.

Even the first outcome could benefit from some planning while gracefully handling the last outcome requires it.

The obvious response to these plausible scenarios is to reduce the dependence of a successful conference on travel. To do this we need to think about what a conference is in terms of the roles that it fulfills. The quick breakdown I see is:

  1. Distilling knowledge. Luckily, our review process is already distributed.
  2. Passing on knowledge.
  3. Meeting people, both old friends and discovering new ones.
  4. Finding a job / employee.

How (and which) of these can be effectively supported remotely?

I’m planning to have discussions over the next few weeks about this to distill out some plans. If you have good ideas, let’s discuss. Unlike most contingency planning, it seems likely that efforts are not wasted no matter what the outcome 🙂

Updates for the new decade

This blog has been quiet for the last year. I have quite a bit to write about but found myself often out of time between work at Microsoft, ICML duties, and family life. Nevertheless, I expect to get back to more substantive discussions as I adjust to the new load.

In the meantime, I’ve updated the site in various ways: SSL now works, and mail for people registering new accounts should work again.

I also setup a twitter account as I’ve often had things left unsaid. I’m not a fan of blog-by-twitter (which seems artificially disjointed), so I expect to use twitter for shorter things and hunch.net for longer things.

ICML has 3(!) Real World Reinforcement Learning Workshops

The first is Sunday afternon during the Industry Expo day. This one is meant to be quite practical, starting with an overview of Contextual Bandits and leading into how to apply the new Personalizer service, the first service in the world functionally supporting general contextual bandit learning.

The second is Friday morning. This one is more academic with many topics. I’ll personally be discussing research questions for real world RL.

The third one is Friday afternoon with more emphasis on sequences of decisions. I expect to here “imitation learning” multiple times 🙂

I’m planning to attend all 3. It’s great to see interest building in this direction, because Real World RL seems like the most promising direction for fruitfully expanding the scope of solvable machine learning problems.

Code submission should be encouraged but not compulsory

ICML, ICLR, and NeurIPS are all considering or experimenting with code and data submission as a part of the reviewer or publication process with the hypothesis that it aids reproducibility of results. Reproducibility has been a rising concern with discussions in paper, workshop, and invited talk.

The fundamental driver is of course lack of reproducibility. Lack of reproducibility is an inherently serious and valid concern for any kind of publishing process where people rely on prior work to compare with and do new things. Lack of reproducibility (due to random initialization for example) was one of the things leading to a period of unpopularity for neural networks when I was a graduate student. That has proved nonviable (Surprise! Learning circuits is important!), but the reproducibility issue remains. Furthermore, there is always an opportunity and latent suspicion that authors ‘cheat’ in reporting results which could be allayed using a reproducible approach.

With the above said, I think the reproducibility proponents should understand that reproducibility is a value but not an absolute value. As an example here, I believe it’s quite worthwhile for the community to see AlphaGoZero published even if the results are not necessarily easily reproduced. There is real value for the community in showing what is possible irrespective of whether or not another game with same master of Go is possible, and there is real value in having an algorithm like this be public even if the code is not. Treating reproducibility as an absolute value could exclude results like this.

An essential understanding here is that machine learning is (at least) 3 different kinds of research.

  • Algorithms: The goal is coming up with a better algorithm for solving some category of learning problems. This is the most typical viewpoint at these conferences.
  • Theory: The goal is generally understanding what is possible or not possible for learning algorithms. Although these papers may have algorithms, they are often not the point and demanding an implementation of them is a waste of time for author, reviewer, and reader.
  • Applications: The goal is solving some particular task. AlphaGoZero is a reasonable example of this—it was about beating the world champion in Go with algorithmic development in service of that. For this kind of research perfect programmatic reproducibility may be infeasible because the computation is to extreme, the data is proprietary, etc…

Using a one-size-fits-all approach where you demand that every paper “is” a programmatically reproducible implementation is a mistake that would create a division that reduces our community. Keeping this three-fold focus fundamentally enriches the community both literally and ontologically.

Another view here is provided by considering the argument at a wider scope. Would you prefer that health regulations/treatments be based on all scientific studies including those where data is not fully released to the public (i.e almost all of them for privacy reasons)? Or would you prefer that health regulations/treatments be based only on data fully released to the public? Preferring the latter is equivalent to ignoring most scientific studies in making decisions.

The alternative to a compulsory approach is to take an additive view. The additive approach has a good track record amongst reviewing process changes.

  • When I was a graduate student, papers were not double blind. The community switched to double blind because it adds an opportunity for reviewers to review fairly and it gives authors a chance to have their work reviewed fairly whether they are junior or senior. As a community we also do not restrict posting on arxiv or talks about a paper before publication, because that would subtract from what authors can do. Double blind reviewing could be divisive, but it is not when used in this fashion.
  • When I was a graduate student, there was also a hard limit on the number of pages in submissions. For theory papers this meant that proofs were not included. We changed the review process to allow (but not require) submission of an appendix which could optionally be used by reviewers. This again adds to the options available to authors/reviewers and is generally viewed as positive by everyone involved.

What can we add to the community in terms reproducibility?

  1. Can reviewers do a better job of reviewing if they have access to the underlying code or data?
  2. Can authors benefit from releasing code?
  3. Can readers of a paper benefit from an accompanying code release?

The answer to each of these question is a clear ‘yes’ if done right.

For reviewers, it’s important to not overburden them. They may lack the computational resources, platform, or personal time to do a full reproduction of results even if that is possible. Hence, we should view code (and data) submission in the same way as an appendix which reviewers may delve into and use if they so desire.

For authors, code release has two benefits—it provides an additional avenue for convincing reviewers who default to skeptical and it makes followup work significantly more likely. My most cited paper was Isomap which did indeed come with a code release. Of course, this is not possible or beneficial for authors in many cases. Maybe it’s a theory paper where the algorithm isn’t the point? Maybe either data or code can’t be fully released since it’s proprietary? There are a variety of reasons. From this viewpoint we see that releasing code should be supported and encouraged but optional.

For readers, having code (and data) available obviously adds to the depth of value that a paper has. Not every reader will take advantage of that but some will and it enormously reduces the barrier to using a paper in many cases.

Let’s assume we do all of these additive and enabling things, which is about where Kamalika and Russ aimed the ICML policy this year.

Is there a need for go further towards compulsory code submission? I don’t yet see evidence that default skeptical reviewers aren’t capable of weighing the value of reproducibility against other values in considering whether a paper should be published.

Should we do less than the additive and enabling things? I don’t see why—the additive approach provides pure improvements to the author/review/publish process. Not everyone is able to take advantage of this, but that seems like a poor reason to restrict others from taking advantage when they can.

One last thing to note is that this year’s code submission process is an experiment. We should all want program chairs to be able to experiment, because that is how improvements happen. We should do our best to work with such experiments, try to make a real assessment of success/failure, and expect adjustments for next year.