STATISTICIAN: “I have never in my years seen an r-squared of 0.99,”, “As a statistician, it makes me question the data.”

via barrons:

“China’s economic data have always been fraught. Now, all eyes are on the coronavirus numbers, which economists and investors are using to estimate the outbreak’s toll—and they are too perfect to mean much.

A statistical analysis of China’s coronavirus casualty data shows a near-perfect prediction model that data analysts say isn’t likely to naturally occur, casting doubt over the reliability of the numbers being reported to the World Health Organization. That’s aside from news on Thursday that health officials in the epicenter of the outbreak reported a surge in new infections after changing how they diagnose the illness.”

Later in the article:

“#@&&on’s re-created the regression analysis of total deaths caused by the virus, which first emerged in the central Chinese city of Wuhan at the end of last year, and found similarly high variance. We ran it by Melody Goodman, associate professor of biostatistics at New York University’s School of Global Public Health.

“I have never in my years seen an r-squared of 0.99,” Goodman says. “As a statistician, it makes me question the data.”

Real human data are never perfectly predictive when it comes to something like an epidemic, Goodman says, since there are countless ways that a person could come into contact with the virus.

For context, Goodman says a “really good” r-squared, in terms of public health data, would be a 0.7. “Anything like 0.99,” she said, “would make me think that someone is simulating data. It would mean you already know what is going to happen.””

 

 

h/t BFD