Penn Calendar Penn A-Z School of Arts and Sciences University of Pennsylvania
India in Transition

India’s COVID-19 Data and the Public Interest

Gautam I. Menon
July 5, 2021

In mid-May 2021, as India’s second COVID-19 wave was reaching its peak, more than 410,000 cases were being recorded each day—the largest daily numbers for any country, at any time in the pandemic. The speed with which case numbers increased led to unprecedented strains on the health care system, both public and private. The media preoccupation with urban India served to hide the more substantial tragedy that was unfolding across rural parts of the nation.

To understand how an epidemic unfolds, one must follow the numbers. The number of people who test positive for COVID-19 on a given day at a specific location is a useful indicator of the spread of the disease. Does, for example, the increase in daily case numbers signal the exponential rise that marks an epidemic? Is this increase a peculiarity of that location or is it shared by nearby locations? Are public health interventions being reflected in a slowing of the pace of increase?

Such data is not easily available on government websites; the most trusted resource on COVID-19 in India is a crowd-sourced database manned by unpaid volunteers with no government support. That this data exists within the Indian government, however, is undisputed. The Indian Council of Medical Research (ICMR) possesses valuable data on every COVID-19 test conducted in India since the very first reported case—data that has never undergone any publicly-available systematic analysis. 

Even before cases in the second wave peaked, a search began for explanations, as well as possible scapegoats. There were discussions of failures of governance, with one example being the inexplicable delay in initiating the tendering processes required to set up new oxygen plants. Whether or not the government should have approved the holding of massive religious events and political rallies, even as cases were increasing, was one theme. That Indian scientists had failed to warn the government in advance of a possible second wave was another.

On April 29th, more than 300 Indian scientists took the unusual step of addressing a public letter directly to Prime Minister Modi. The letter pointed out that scientists lacked “wide access to the granular testing data that ICMR has been collating since the beginning of the pandemic” and that the “ICMR database is inaccessible to anyone outside of the government as well as to many within the government.” The letter went on to say that “Most scientists—including several identified by DST and NITI Aayog to develop new prediction models for India—do not have access to these data.”

The scientists’ letter spoke of the need for the collection and timely release of large-scale genomic surveillance data and for testing and clinical data to be made available. It also asked for more data to be collected and made available on the clinical outcomes of hospitalized patients and on the Indian population’s immune response to vaccination.

Such an open appeal was unusual, especially since virtually all of these scientists were from government-funded institutions. As Nature pointed out, “By identifying themselves, the signatories took a risk: in the past, the Modi government has not reacted well to researchers organizing to question its policies.”

Shortly thereafter, the government’s principal scientific adviser (PSA), Prof. K. Vijay Raghavan, acknowledged the concerns outlined in the letter and signaled the government’s openness to sharing data from multiple sources. A press release from his office listed specific government agencies, together with points of contact, which would enable access to available datasets to interested scientists.

However, Prof. Raghavan did not specify a timeline for the availability of this data, nor did the press release mention to whom one could appeal should a reasonable request be denied by a specific agency. Also, a core issue remained unaddressed: the idea that scientists should have to “request” data and that these requests would have to be “granted,” rather than data being made freely available as a public good.

As Indian data journalist S. Rukmini points out, “[In India] there is this attitude that if you want information, you should put an RTI request. It should be the other way round. Give it to us without us asking. We have failed to create a culture where citizens know that access to data is their right and not having it is a violation of their rights.” Along these lines, health journalist Maitri Porecha recently stated that “government institutes like ICMR and INSACOG expect scientists to write to them with requests for data. But without looking at the dataset, scientists won't know what to ask for.”

Consider the ICMR data on testing. If the same individual tested positive on successive tests, provided the tests were spaced well apart, that would suggest a reinfection. With a careful analysis, the likelihood of reinfections could be extracted, a number central to the question of whether COVID-19 might become an endemic disease. By combining different databases, even more useful information could be revealed. Comparing reports of one individual’s positive tests with prior vaccinations could indicate the probability of that person getting infected after vaccination (a so-called “vaccine breakthrough” event). By checking to see whether the positive test happened after the first but before the second dose of vaccine, or after the second dose (with appropriate statistical adjustments), the relative efficacy of a single vaccine dose at preventing disease could have been derived.

By examining symptoms reported after a vaccine breakthrough event, one could infer the extent to which vaccines help to reduce disease severity. Combining testing and vaccination data with genomic sequencing from specific geographic regions, one could look for correlations between the potential emergence of new variants, local spikes in test positivity, and potential increases in breakthrough infections.

Even more delicate questions, such as the impact of altering the spacing of doses, could be addressed with such a large data set. This could have taken advantage of changes in government policy regarding the interval between vaccine shots.

These are simple questions of data analysis. Some of them can be answered, albeit at a much smaller scale, at the level of single hospitals or large organizations, where a cohort of employees are tested regularly and can be followed upon. However, such studies are limited by issues of cohort construction and cohort size. For example, the well-studied CSIR cohort is a largely urban and more educated cohort than the one that would be obtained by sampling at random from the general population. By contrast, the data collected by the ICMR and its sister agencies is uniquely comprehensive. It contains data from individuals across the length and breadth of India, every slice of society, every level of income, and every possible pre-existing condition.

There are many loopholes that aid those in government, who would wish to limit public access to data. One loophole is the need to ensure the privacy of personal data. The claim is that data privacy cannot be guaranteed if outsiders were to be provided access. Another is that expertise in the government system is more than sufficient to analyze available data without requiring help from outside. For some data, national security can be invoked.

These are tenuous claims. Making databases accessible so that information can be extracted without compromising personal details is central to those parts of modern computer science and data science that deal with privacy. Expertise in data analysis in the government sphere is undeniable, but the fact remains that no analysis of the ICMR database has appeared in any public form so as to be appraised and critiqued.

Perhaps the data itself is so flawed—due to problems with variability in how the data was entered—as to be useless. Given pressures on laboratories to conduct a huge number of tests at the peak of the second wave, this is a reasonable possibility. However, the quality of the data can only be assessed if it is available to be examined in the first place. And well-studied methods do exist to extract some information from corrupted data.

A final problem is that the ownership of Indian COVID-19 data is scattered across multiple agencies. The ICMR holds testing data, the National Centre for Disease Control (NCDC) is responsible for monitoring and responding to outbreaks, the Ministry of Health consolidates information related to hospitals and health-care logistics, and the COWIN platform contains vaccine-related information. The Indian SARS-CoV-2 Genomic Consortia (INSACOG) is responsible for genomic sequencing.

As a result, multiple agencies must agree to cooperate in the release of data for a study that uses different databases to be implemented. The approach indicated by the PSA is still piecemeal since each agency must agree separately to making their data available.

With many eyes focused on the numbers, analyzing them in ways that combine the skills of multiple statisticians, epidemiologists, data scientists, and modelers, it may have been possible to extract unusual patterns predictive of a new wave well before cases rose exponentially. Indeed, how to examine such data, in real time, for signals of an incipient outbreak is an important question for the future.

Pandemics don’t respect borders and even island countries cannot afford to insulate themselves forever. A priority now is to reduce the number of infections worldwide, even while allowing economic activity to return, so that the scope for new variants to emerge is reduced as much as possible.

The Delta variant that was first reported in India has now been seen in more than ninety countries around the world. Given the data it has accumulated, India can address detailed questions of vaccine efficiency against this and other variants of concern. Making such data more freely available to those who can study it in novel ways should be a first step toward readying India for its uncertain future.

Gautam I. Menon is a Professor at Ashoka University, Sonepat and at the Institute of Mathematical Sciences, Chennai. He can be reached at

India in Transition (IiT) is published by the Center for the Advanced Study of India (CASI) of the University of Pennsylvania. All viewpoints, positions, and conclusions expressed in IiT are solely those of the author(s) and not specifically those of CASI.

© 2021 Center for the Advanced Study of India and the Trustees of the University of Pennsylvania. All rights reserved.