1 Introduction

Two key components in containing the COVID-19 pandemic are public awareness and public trust in the government. These components critically depend on timely and accessible dissemination of COVID-19 data by the government1. While there are studies showing disparities in personal healthcare access in India, very little was known about the quality of access to public health data across India, especially during the early months of COVID-19 pandemic2,3. To address this problem, we developed a semi-quantitative framework to assess the quality of COVID-19 data reporting, and used it to calculate a COVID-19 Data Reporting Score (CDRS) for 29 state and union territory (UT) governments of India4. This assessment was done during the 2-week period from May 19 to June 1, 2020. The study showed a strong disparity in the quality of COVID-19 data reporting across India—CDRS varied from 0.61 (good) to 0.0 (poor) across the country, with a median value of 0.26.

In this communication, we present the findings from a second assessment of the quality of COVID-19 data reporting across India. This study was done during the 2-week period from July 12 to July 25, 2020, and includes 35 statesFootnote 1 and UTs of India. Hereafter, this 2-week period is referred to as the scoring period. Lakshadweep was excluded from the study as it did not have any COVID-19-positive cases as of July 12, 2020. Hereafter, the first assessment done during May is referred to as study-1 and the second assessment from July is referred to as study-2.

Table 1: CDRS scoring table. Each “Metric-Report Item” pair is an indicator. Overall there are 45 indicators. The scores that an indicator can take are listed in the table. NA denotes not applicable. This table is filled for each state by inspecting the COVID-19 data reported by that state

2 Methods

Our scoring framework consists of 45 indicators spanning four key dimensions of public health data reporting—availability, accessibility, granularity, and privacy4,5. These indicators capture the presence or absence of a piece of information in the reported data and the format in which it is reported. We would like to emphasize that our framework does not assess the “accuracy of the reported data.”

In the availability dimension, we check the availability of basic data such as, daily and cumulative number of confirmed cases, deaths, and recoveries in the state5. To assess the accessibility of data, we check for the presence of trend graphics, availability of data in English, and the ease of getting to the web page where data are reported. Trend graphics are important because they make it easier to see patterns in the data. To evaluate the granularity of data, we check whether the state is reporting cumulative data stratified by age, gender, comorbidity, and districts. Granular data helps a layperson connect with the data at a personal level. To assess if a state is ensuring privacy while reporting data, we check if any personally identifiable information of COVID-19 suspects or patients are made publicly available on the state’s COVID-19 data reporting page. The report items shown as column headers in Table 1 represent five possible stages in which an individual can find themselves during the pandemic.

Each “Metric-Report Item” pair shown in Table 1 is an indicator. The entries in the table represent the possible scores an indicator can earn4. This table is filled for each state during the scoring period by checking the data reported by that state. For example, if a state is reporting total confirmed COVID-19 cases then a score of 1 is assigned to that indicator. The scores recorded in the table are collectively referred to as the scoring data.

Using the scoring data, four categorical scores, one for each dimension, and an overall score is calculated for each state. The categorical scores are obtained by summing the scores earned by the indicators in that dimension. The overall score is the normalized sum of the four categorical scores, and is referred to as the COVID-19 Data Reporting Score (CDRS). For further details on the scoring metrics, scoring process, and score calculation, refer to our article introducing the CDRS framework4.

Figure 1:
figure 1

Filled map showing CDRS across India. The map represents the disparity in the quality of COVID-19 data reporting across India. Dark green (red) indicates states that have high (low) quality data reporting.

Figure 2:
figure 2

Left: A dot plot showing the spread of CDRS values. States are sorted in the decreasing order of CDRS. Right: The incremental change in CDRS since study-1. Incremental change is not shown for states (marked by an *) that were excluded in study-1.

3 Results and Discussion

CDRS and the normalized categorical scores for the states in India are tabulated in Table 2. The categorical scores are normalized by the difference of maximum and minimum score possible in that category. The value of CDRS across states indicates a strong disparity in the quality of COVID-19 data reporting in India. The five number summary of CDRS is, min = 0.00, first quartile = 0.20, median = 0.30, third quartile = 0.35, and maximum = 0.63. The geographical disparity in CDRS is evident from the mapFootnote 2 shown in Fig. 1.

Table 2: CDRS and the normalized categorical scores for the states in India. States are listed in the alphabetical order.

Figure 2 lists states in the decreasing order of CDRS. As seen in the figure, Karnataka is at the top, Bihar and Uttar Pradesh are at the bottom. Bihar and Uttar Pradesh get a CDRS of 0 because they do not release any COVID-19 data on their government or health department website. Figure 2 also shows the incremental change in CDRS from its previous value calculated during study-1 conducted between May 19 and June 1, 2020. As seen in Fig. 2 CDRS has increased in 12 states and decreased in 5 states since the previous study. Figure 3 presents boxplots showing CDRS across India from study-1 and study-2. As seen in the figure the median value has increased slightly from 0.26 to 0.30.

Figure 4 shows the number of states that get a non-zero score on an indicator in our framework. Among the 35 states assessed in this study, 33 states report some data on the COVID-19 situation in the state. Bihar and Uttar Pradesh continue to not publish any data on their government or health department website. The remaining 33 states report the total deaths and recovered cases, while only 32 of them report the total confirmed cases. Gujarat does not report the total confirmed cases but reports the number of active cases.

CDRS of 12 states have improved in study-2 as compared to study-1. Nine of the 12 states, namely, Andhra Pradesh, Chhattisgarh, Goa, Haryana, Karnataka, Kerala, Ladakh, Uttarakhand, and West Bengal have started reporting more granular data. This is encouraging and is definitely a step in the right direction.

In general, the states continue to score the lowest in the granularity dimension. Jharkhand, which had the highest granularity score in study-1 has stopped reporting age- and gender-stratified data for the total confirmed cases, deaths, and recoveries since June 8, 2020. Hence, its normalized granularity score dropped from 0.50 to 0.17 in this study. It might be worthwhile to investigate what led the Jharkhand government to stop reporting age- and gender-stratified data.

Punjab and Chandigarh compromised the privacy of individuals under quarantine by releasing personally identifiable information on their official websites. Chandigarh releases the name and address of people under home quarantine on a daily basis. Punjab released name, age, gender, and mobile number of persons inbound to the state from New Delhi on May 10, 20204. As of July 25, 2020, the document is still present on the Punjab government’s health department website.

Figure 3:
figure 3

Boxplots showing CDRS across India from the assessments conducted during May (study-1) and July 2020 (study-2). In the boxplot for July the outlier denotes Karnataka.

Figure 4:
figure 4

Table shows the number of states that get a non-zero score on an indicator. For example, (1) total confirmed is 32 indicating that 32 states report total confirmed COVID-19 cases, (2) availability in English is 29 indicating that 29 states are reporting data in English. Privacy indicator is not shown in this table.

3.1 Additional Comments

Testing: The strategy recommended by ICMR for COVID-19 testing in India has evolved over time6,7,8. The degree of relevance of testing data in understanding the spread of COVID-19 within a state depends on the testing strategy (e.g., how people are chosen for testing). Therefore, we did not include an indicator in our framework to score the reporting of testing data. However, we note that all the states in India report some data on testing. But the reported testing data in most states do not distinguish total samples tested from total persons tested. In other words, most states are reporting total samples tested without specifying how many of them are unique. This is an important limitation to the data that is available to track the testing in a state9. For instance, in the case of Tamil Nadu which reports both total samples and total persons tested, the difference between those two numbers is more than a lakh as on August 7, 202010.

Age brackets: Karnataka, Odisha, and Tamil Nadu report total number of confirmed cases stratified by age. Karnataka and Kerala report the total number of deaths stratified by age. However, the number of age brackets used by each of these states is different, making it difficult to compare the age distribution of confirmed and deceased individuals across states. For example, Karnataka, Odisha, and Tamil Nadu use eight, four, and three age brackets, respectively, to report the total number of confirmed cases stratified by age.

Aarogya Setu mobile app: On April 02, 2020, the Indian government launched Aarogya Setu mobile app with the objective of enabling Bluetooth-based contact tracing, mapping of likely hotspots, and dissemination of relevant COVID-19 information11. To use the app, one has to register with a mobile number, agree to its data sharing policy, and give it access to Bluetooth and location information. While access to phone number, Bluetooth, and location information might be necessary for contact tracing, we believe that expecting people to provide such information just to access critical COVID-19 data is unreasonable. Therefore, we did not consider data reported through the Aarogya Setu app while scoring the states. However, we would like to mention that the app reports cumulative and daily data for confirmed, deaths, and recoveries, both as text and trend graphics for all states.

Data aggregation platforms: covid19india.org is a volunteer-driven nationwide COVID-19 data aggregation initiative. They collect and report COVID-19 data from across the country. While the initiative is noteworthy, it does not replace the need for high-quality data reporting on official government websites for the following reason. The initiative can fill-in gaps in the accessibility dimension described in our framework. However, they cannot fill-in for the gaps along the availability and granularity dimensions resulting from the lack of corresponding data released by the government.

4 Conclusion

Our assessment informs the public health efforts in India about the disparity in the quality of COVID-19 data reporting across the country. The available evidence shows that an improvement in the quality of data reporting is required all across India. The disparity in CDRS shows the lack of a unified framework for reporting COVID-19 data in India, and highlights the need for a national agency like Indian Council of Medical Research (ICMR) to monitor or audit the quality of data reporting done by the states. The disparate reporting score also reflects inequality in individual access to public health information and privacy protection based on the state of residence4.

Overall, there is an urgent need to fill the gaps in COVID-19 data reporting across the states. There has been only a marginal improvement in the quality of COVID-19 data reporting done by the states between May and July. With the pandemic being far from over, it is imperative that the states continue to learn from each other and improve their data reporting. We conclude this communication by quoting the following from the Economic Survey of India, “Given that sophisticated technologies already exist to protect privacy and share confidential information, governments can create data as a public good within the legal framework of data privacy. In the spirit of the Constitution of India, data should be ‘of the people, by the people, for the people’.”12