Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

China has stopped reporting cumulative cases - alternatives being explored but thus far not identified #6414

Open
CSSEGISandData opened this issue Dec 29, 2022 · 41 comments

Comments

@CSSEGISandData
Copy link
Owner

Hello all,

On December 25, the National Health Commission of the People's Republic of China posted an announcement that translates to the following when using Google Translate:

From now on, daily epidemic information will no longer be released, and relevant epidemic information will be released by the China Centers for Disease Control and Prevention for reference and research.

We are working to identify what 'relevant epidemic information' will continue to be released. At present, we have not identified any official or media sources that are publishing cumulative cases or deaths at the national or provincial level. Until a reasonable alternative is found, our data will remain stale.

@dlluskin
Copy link

The data is now being published by China's CDC. Apparently it will be published monthly. Have you looked into this source? https://www.chinadaily.com.cn/a/202212/27/WS63aabdeba31057c47eba6737.html

@CSSEGISandData
Copy link
Owner Author

@dlluskin, as per the article, the China CDC will not begin publishing this data until January 8. Until we are able to evaluate the source url, we can't be certain that it will contain information that can be included in this repository. For example, how to properly interpret the statement: The reporting of COVID-19 cases will be based on the number of hospitalized patients, the positive results of nucleic acid tests and antigen tests that are reported, and the monitoring of key groups, key venues and the mutation of variants, he said in regards to cumulative COVID-19 cases is unclear. At present, the last report on the China CDC website includes data through November and based on reports published by the National Health Commission of the People's Republic of China, which is the entity that has ceased reporting.

@dlluskin
Copy link

dlluskin commented Dec 29, 2022 via email

@eleanorlutz
Copy link

Hello! I think that the latest report from the China CDC website is actually at this link, which is updated daily and contains data up to today.

@dlluskin
Copy link

dlluskin commented Dec 30, 2022 via email

@ZauberViolino
Copy link

As the author of #6398, I commented on Dec. 25 afternoon (UTC+8) that China CDC had reported Dec. 24 data.

As a matter of fact, China CDC is still reporting COVID-19 cases at https://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_11809/ on a daily basis. I'm sorry I didn't make it clearer.

And I heard the news said the report frequency will be gradually reduced to once a month. We will see what happens next.

@CSSEGISandData

@pwdel
Copy link

pwdel commented Jan 5, 2023

Just chiming in here as an interested party, trying to be helpful.

So from what I am reading about Johns Hopkins' health data reporting system on their website I see:

Building modern data management systems for local, state, federal, and global public health agencies is critical to preparing the nation and the world for the next major public health crisis.

https://coronavirus.jhu.edu/pandemic-data-initiative

I understand that there is a discrepancy in official government reporting on a national level in China and on a municipal level. For example this article here from National Business Daily, which is a state-owned newspaper:

https://www.nbd.com.cn/articles/2022-12-24/2607048.html

We have the following being reported:

据青岛日报,23日,市卫生健康委主任薄涛介绍,当前青岛的新冠感染发病高峰期还未到来,正处于高峰来临前的快速传播阶段。按照监测数据推测,青岛目前每日新增感染量为49万人-53万人,明后天会在此基础上以10%增速增加。

Which translates to:

According to Qingdao Daily, on the 23rd, Bo Tao, director of the Municipal Health and Health Commission, introduced that the current peak of the new crown infection in Qingdao has not yet arrived, and is in the stage of rapid transmission before the peak. According to the monitoring data, the current daily number of new infections in Qingdao is 490,000 to 530,000 people, and it will increase at a rate of 10% tomorrow and the day after tomorrow .

So basically, what I'm reading is that, a Health Official, albeit municipal is going on official record via a newspaper, with an estimated count in the hundreds of thousands. Meanwhile, Johns Hopkins is reporting National-Level reporting which is in the low thousands, based upon updated testing criteria.

So essentially, we have at least one report of record, a data point, at one point in time, reporting hundreds of thousands of cases, while another data point is reporting thousands of cases. The data notes on the Johns Hopkins website do not create any clarity around the large discrepancies that have been reported between local officials and the national level.

https://coronavirus.jhu.edu/region-data-notes

I'm wondering if another data note could be released which reflects and tries to give some kind of rough estimate to the public on what the cumulative official numbers have been found from municipalities on a lower bound, based upon a best effort search from Johns Hopkins.

Currently, the China CDC and NIH data feed seem to not line up with what Johns Hopkins sees as being reflective of actual case counts. I wonder if at least a tallied gathering of case counts from municipal health sources would help clarify at least the idea that case counts are thought to be higher than what is being reported by the CDC and NIH?

@CSSEGISandData
Copy link
Owner Author

Thanks all - we have swapped to using the source provided by @eleanorlutz. Historical data has been corrected.

edomt added a commit to owid/covid-19-data that referenced this issue Jan 6, 2023
@dlluskin
Copy link

dlluskin commented Jan 6, 2023

Thanks for all your hard work. It looks to me like today's data, on a cumulative basis, shows a big negative versus yesterday. Both infections and cases. All provinces (except Hong Kong).

@pwdel
Copy link

pwdel commented Jan 6, 2023

So to clarify what @dlluskin pointed out, I double checked this file, exported the raw, plugged it into Pandas and saw the following:

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

image

@pwdel
Copy link

pwdel commented Jan 6, 2023

Looking at the source data from @eleanorlutz , translated using Google Translate:

https://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_11809/202301/t20230106_263220.html

  From 00:00 to 24:00 on January 5, 31 provinces (autonomous regions and municipalities) and the Xinjiang Production and Construction Corps reported 9,548 new confirmed cases. Among them, 5 were imported cases (4 in Guangdong, 1 in Heilongjiang); 9,543 local cases (3,143 in Guangdong, 1,807 in Guangxi, 924 in Fujian, 840 in Beijing, 413 in Hunan, 252 in Shanxi, 252 in Hubei, and Chongqing). 235 cases, 185 in Sichuan, 169 in Yunnan, 162 in Jiangxi, 154 in Tianjin, 132 in Shaanxi, 125 in Shanghai, 125 in Ningxia, 104 in Heilongjiang, 84 in Shandong, 69 in Guizhou, 68 in Zhejiang, and 52 in Xinjiang , 48 cases in Henan, 37 cases in Inner Mongolia, 37 cases in Jilin, 34 cases in Liaoning, 30 cases in Corps, 29 cases in Hebei, 23 cases in Jiangsu, 4 cases in Gansu, 3 cases in Tibet, 2 cases in Hainan, and 1 case in Anhui). 5 new deaths, all local cases (1 in Heilongjiang, 1 in Jiangxi, 1 in Shandong, 1 in Chongqing, 1 in Sichuan); 25 new suspected cases, all local cases (16 in Hunan, 1 in Guangxi 5 cases, 3 cases in Fujian, 1 case in Guizhou).

So just as a quick check on two provinces, I'm reading that as...there were from the source material:

+840 new cases in Beijing,
 +924 in Fujian 

whereas the Johns Hopkins feed linked above shows:

-62 new cases in Beijing,
-3773 new cases in Fujian

Going back to 1/4/2023 and comparing to 1/3/2023 to do a checksum:

https://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_11809/202301/t20230105_263213.html

The deltas from source are:

+798 new cases in Beijing,
+807 new cases in Fujian

The deltas from Johns Hopkins feed are:

+798 new cases in Beijing,
+807 new cases in Fujian

@pwdel
Copy link

pwdel commented Jan 6, 2023

Sorry to be so persnickety about this.

The above analysis has to do with the, "Global Time Series" document.

Looking at this commit here for the individual date structured csv files, "Daily Reports":

da1a04b

It appears there was no update for 2022-12-19.csv, whereas, if I'm transposing things correctly, there was an update given for that date here:

https://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_11809/202212/t20221220_263064.html

@dlluskin
Copy link

dlluskin commented Jan 7, 2023

It looks like yesterday's error has not been addressed, and there has been no update at all for the following day.

@dlluskin
Copy link

dlluskin commented Jan 8, 2023

Still no new updates. Can someone tell us what is going on? Thanks as always.

@dlluskin
Copy link

dlluskin commented Jan 9, 2023

Again today no new updates. PLEASE SOMEONE TELL US WHAT IS GOING ON!

@pwdel
Copy link

pwdel commented Jan 11, 2023

I see the data has been reconciled and is no longer showing negative numbers.

That being said, I wonder how misleading this data is being and if there is another data note warranted?

https://www.bbc.com/news/world-asia-china-64208127

Provincial official Kan Quancheng revealed the figure - amounting to about 88.5 million people - at a press conference. Mr Kan did not specify a timeline for when all the infections happened - but as China's previous zero-Covid policy kept cases to a minimum, it's likely the vast majority of Henan's infections occurred in the past few weeks.

So basically, this is one government report from a provincial-level health official giving a rough estimate of cases in the millions. Meanwhile, this Johns Hopkins data feed seems to be showing a 7-day average of around 25k or so.

This is several orders of magnitude difference and I think it justifies further research and explanation.

I understand from reading how the WHO seems to report data that there are differences in how cases get reconciled over time, and from Johns Hopkins notes, China changed how it reports cases in January and then in December 2022 to alternately include/exclude asymptomatic cases.

I'm just curious at what point this huge discrepancy starts to become a real risk to global health. Is an accurate picture, with no notes, being presented by the raw Johns Hopkins data feed? E.g. is the note that was given on December 20th sufficient and should we try to compile provincial level reports and let people know that there is a large margin of error?

@dlluskin
Copy link

dlluskin commented Jan 11, 2023 via email

@ZauberViolino
Copy link

Yeah as you all have seen, China CDC stopped reporting COVID-19 data a few days ago. I checked websites of the Health Commissions of the 31 provinces (cities, regions) and Xinjiang Production and Construction Corps, and none of them is reporting data. I wonder now what source we can rely on...

@beansrowning
Copy link

Again today no new updates. PLEASE SOMEONE TELL US WHAT IS GOING ON!

Hey, just a note from a lurker from US CDC. I think this was already mentioned, but the cessation of daily COVID-19 reporting by China CDC was planned based on the previous pronouncement of the downgrading of COVID-19 to a "Category B" disease. 1/8 was the final daily report.

There's probably a worthwhile conversation to be had about how to proceed with most of the publicly available data sources drying up. I'll mention that, even in the absence of updated data from Mainland China, the ongoing collection of disaggregated data on HK, Macau, and Taiwan is of great utility for us and others, as these data are submitted in the aggregate to WHO via WPRO office.

@pwdel
Copy link

pwdel commented Jan 13, 2023

@beansrowning what about disaggregated reports from municipal and provincial level health officials? Would they hypothetically be taken into account as a part of the CDC's analysis?

I guess I'm getting into the, "data notes," discussion rather than the, "data feed and visualization," discussion, but per my comment above:

https://www.bbc.com/news/world-asia-china-64208127

Provincial official Kan Quancheng revealed the figure - amounting to about 88.5 million people - at a press conference. Mr Kan did not specify a timeline for when all the infections happened - but as China's previous zero-Covid policy kept cases to a minimum, it's likely the vast majority of Henan's infections occurred in the past few weeks.

https://www.theguardian.com/world/2022/dec/24/chinese-city-seeing-half-a-million-covid-cases-a-day-local-health-chief

Half a million people a day are being infected with Covid-19 in a single Chinese city, a senior health official has said, in a rare and quickly censored acknowledgment that the country’s wave of infections is not being reflected in official statistics.

A news outlet operated by the ruling Communist party in Qingdao reported the municipal health chief as saying that the eastern city was seeing “between 490,000 and 530,000” new Covid cases a day.

There more be more like this.

@beansrowning
Copy link

@beansrowning what about disaggregated reports from municipal and provincial level health officials? Would they hypothetically be taken into account as a part of the CDC's analysis?

Yes, if they're vetted, though my impression is that reporting at this level has also ceased due to the change in disease classification. Qualitative data, Cables, News reports, etc. are always summarized, but one-off reports are less useful for longitudinal tracking.

We've relied on other unofficial sources also (DXY, Baidu, Baidu Search Index) but most based in China have ended operation as of a few weeks ago. As all jurisdictions have stopped mandatory PCR testing and case tracking, I don't see that changing.

I guess I'm getting into the, "data notes," discussion rather than the, "data feed and visualization," discussion, but per my comment above:

It's a valid concern, but unfortunately not unique to China at this stage. Case ascertainment has dropped across the board as testing rates decline and resources shift. The decision to caveat those data are up to you; these might be helpful key dates in addition to the 1/8 re-classification that helps explain the context:

My point in bringing up HK/Macau/Taiwan is to point out that those streams are still valuable to have for the international community irrespective of China Mainland + provincial counts. If the decision is made to switch instead to WHO data, I could see questions arising about duplicative capture. Just voicing my personal support for keeping a disaggregated stream so I don't have to recreate the wheel in-house.

I don't want to go too off-topic here, but just my 2 cents.

@pwdel
Copy link

pwdel commented Jan 13, 2023

I'm not part of Johns Hopkins, I'm just participating in the discussion as an interested member of the public, so I think your comments are likely well-appreciated. Github is an open-source platform and I think as long as you are abiding by general etiquette, Johns Hopkins likely put this Github issue together because they are looking for solutions, and their staff doesn't have all the time in the world.

So that being said, I guess it seems like the exercise of gathering this data is to reduce the variables down into either a chart or something more univariate that people can understand and make decisions on.

At this point, an update is warranted mentioning that case ascertainment data has dropped across the board in China and that longitudinal tracking is not available at the moment. I would offer for record that unvetted official sources within China gave singular reports of cases of high orders of magnitude above where the chart record shows.

I have no idea what the ramifications of having hundreds of millions of cases in China will be, whether there are now discussions about additional variants being more probable, whether case loads will grow globally again even with vaccinations and immunities due to the sheer magnitude of possible infections, or whether basically nothing big will happen, or somewhere in between. But I do think from a data science perspective that at least a note explaining the discrepancy, or stating that there is actually not a discrepancy, or we don't know, is warranted.

@pwdel
Copy link

pwdel commented Jan 15, 2023

https://www.reuters.com/world/china/air-travel-recovers-china-amid-covid-infection-worries-2023-01-14/

China reports huge rise in COVID-related deaths after data criticism
Between Dec. 8 and Jan. 12, the number of COVID-related deaths in Chinese hospitals totalled 59,938, Jiao Yahui, head of the Bureau of Medical Administration under the National Health Commission (NHC), told a media briefing.
Of those fatalities, 5,503 were caused by respiratory failure due to COVID and the remainder resulted from a combination of COVID and other diseases, she said.

https://www.cnn.com/2023/01/14/china/china-covid-deaths-intl/index.html

Jiao, the medical official, said fever clinical visits and Covid hospitalizations in China have already peaked.

According to the NHC, fever clinic visits – both in cities and rural areas – have been declining since the peak when more than 2.86 million people visited them on December 23, 2022.

On January 12, 477,000 people visited fever clinics across China, Jiao said Saturday.

The NHC said hospitalizations of Covid-19 patients also peaked on January 5, 2023, when 1.63 million people was hospitalized, and 1.27 million Covid patients were still in hospital as of January 12, Jiao added.

@ZauberViolino
Copy link

I remember the news said something about reducing COVID-19-data-release requency to once a month. Maybe we should just wait until 02-09 and see what happens next.

@beansrowning
Copy link

ICYMI, the first update was this weekend, the figures match what was presented to media:

https://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_13141/202301/t20230115_263381.html

(machine translated)

  1. Cases
      On January 12 , 2023 , 31 provinces (autonomous regions, municipalities directly under the Central Government) and Xinjiang Production and Construction Corps had 1.27 million hospitalized cases of new coronavirus infection , and 104,018 severe cases (including critical cases) , of which 7357 were severe cases of new coronavirus infection 96,661 cases of severe basic diseases combined with new coronavirus infection . From December 8 , 2022 to January 12 , 2023, medical institutions in 31 provinces (autonomous regions and municipalities) and Xinjiang Production and Construction Corps accumulatively had 59,938 hospitalized deaths related to COVID - 19 infection. 5,503 cases died of exhaustion, and 54,435 cases died of exacerbation due to underlying diseases combined with new coronavirus infection .

  2. Vaccination status
      As of January 12 , 2023 , 31 provinces (autonomous regions, municipalities directly under the central government) and the Xinjiang Production and Construction Corps have reported a total of 3,487,638,000 doses of the new crown virus vaccine , with a total of 1,310,096,000 people vaccinated, and 1,276,302,000 people who have completed the full course of vaccination , completing the first dose 825.855 million people received booster immunization . Among them, the elderly over the age of 60 reported a total of 676.75 million doses of the new crown virus vaccine , the total number of vaccinations was 241.542 million, 229.985 million people completed the full vaccination , and 191.445 million people completed the first dose of booster immunization .

@dlluskin
Copy link

dlluskin commented Jan 18, 2023 via email

@pwdel
Copy link

pwdel commented Jan 23, 2023

China CDC Chief Epidemiologist has stated on Weibo this weekend that 80% of population has already been infected with COVID.

https://m.weibo.cn/status/4860354171766047

https://weekly.chinacdc.cn/en/article/doi/10.46234/ccdcw2020.175

@eleanorlutz
Copy link

hello again - I think the updated new China CDC data can be found at this link, and it appears to be updated weekly.

@dlluskin
Copy link

dlluskin commented Jan 23, 2023 via email

@dlluskin
Copy link

dlluskin commented Jan 23, 2023 via email

@beansrowning
Copy link

Thanks. This is helpful. Do you have a procedure for funding the updated page each week? I mean for those who can’t surf the web iin Chinese

I would scrape the links at this page, but that's just a hunch. https://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_13141/

@dlluskin
Copy link

dlluskin commented Jan 23, 2023 via email

@pwdel
Copy link

pwdel commented Jan 23, 2023

OK so if I understand the chain of links correctly, the bottom, first category B report on 1/15/2023 states:

On January 12, 2023, there were 1.27 million hospitalized cases of COVID-19 infection in 31 provinces

Then, the top, second category B report says:

On January 19, there were 471,739 hospitalized cases of COVID-19 infection in 31 provinces

So this means there was a drop of ~798k hospitalized cases in the last week?

Does this also mean, that from this link on 1/8/2023:

https://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_11809/202301/t20230109_263283.html

on January 8, 31 provinces (autonomous regions and municipalities directly under the Central Government) and Xinjiang Production and Construction Corps reported 14,171 new confirmed cases

This implies that the reported new daily cases went something like this, assuming linear expansion ...? :

  • 1/8/23 +14,171
  • 1/9/23 +313,957
  • 1/10/23 +313,957
  • 1/11/23 +313,957
  • 1/12/23 +313,957

Or would that be modeled out exponentially to fill in the gaps? Or how would the data be filled in longitudinally? Would you just have 0 for 1/9, 1/10, 1/11 and then +1.27M on 1/12/23? What's the standard way of reporting this longitudinally?

@dlluskin
Copy link

dlluskin commented Jan 23, 2023 via email

@ZauberViolino
Copy link

This implies that the reported new daily cases went something like this, assuming linear expansion[...]

Yeah, I think this way is acceptable, though I prefer +1,255,829 on 01-12. (Either way feels like original research to me /shrug)

@ZauberViolino
Copy link

ZauberViolino commented Jan 25, 2023

Update on China data

The China CDC released some data today. See https://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/jszl_13141/202301/t20230125_263519.html

Sadly there are no clean and clear numbers about the confirmed/deaths/recoveries.

Note: The report includes data of COVID-19 deaths in hospital (在院新冠病毒感染死亡病例). It seems like the figure includes "die of respiratory failure caused by Covid" and "die of underlying diseases combined with Covid" from the Category B Reports.

Useful links

@pwdel
Copy link

pwdel commented Jan 25, 2023

Yeah, I think this way is acceptable, though I prefer +1,255,829 on 01-12. (Either way feels like original research to me /shrug)

True that. Regardless of whether the quality of the data lines up with previous collection data, some update of the data seems warranted, from the standpoint of helping other data scientists understand the raw time series feed, there seems to be the ability for Johns Hopkins to post data notes on its website. So updating the data, along with an thorough explanatory note, perhaps even linking to these original source data sources, sounds like a way to keep the data open to other scientists while also being as responsible as possible and explaining the incongruence in quality from past to future.

@pwdel
Copy link

pwdel commented Jan 31, 2023

Just updating anyone who may be watching:

#6520 (comment)

We are still working to understand how we may incorporate the "hospitalized cases of new coronavirus infection" into our dataset. We appreciate your patience.

I'm not sure if that will be the new issue to track this issue and if this one gets closed?

@CSSEGISandData
Copy link
Owner Author

All involved in this message thread may be interested in #6543. WHO has published updated case and death totals for China.

@pwdel
Copy link

pwdel commented Feb 4, 2023

For anyone still following this issue, great breakdown posted here on new issue about how the data is being reconciled:

#6543 (comment)

@pwdel
Copy link

pwdel commented Feb 7, 2023

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants