(Scroll past the Writing Sample to understand why you should do this. I came up with the aphorism “significance of evidence is not evidence of significance” by 25 June 2021.)
If your model isn’t performing well in prod on new data, untracked HARKing might be why.
Imagine calling your shot in pool — after you made it! That’s HARKing, a bad research habit. Preregistration is when you call each shot even before stepping up to the table. You win some, you lose some, but you play fair — and that’s how you improve your game.
(NOTE: On 13 October 2020, I changed “detect” to “discern” to avoid confusion in analyses related to fields like signal detection or anomaly detection.)
Statistical significance was never meant to imply scientific importance. … In sum, “statistically significant” — don’t say it and don’t use it. (Wasserstein et al, 2019, in The American Statistical Association’s publication “The American Statistician”)
You might see me twitch whenever I hear a lecturer, presenter, or researcher say “significant” when they clearly mean “statistically significant” (which in fact has nothing to do with being “clinically significant” or “scientifically significant”; see Wasserstein et al, 2019, and…
My Filipino American journey as a privileged Brown Asian immigrant
(This essay has been updated as “How A ‘Secret Asian Man’ Embraced Anti-Racism”, which can be found at https://laist.com/2020/09/25/race-in_la_how_a_secret_asian_man_embraced_anti-racism.php.)
Why do you notice the splinter in your brother’s eye, but do not perceive the wooden beam in your own eye? How can you say to your brother, ‘Let me remove that splinter from your eye,’ while the wooden beam is in your eye? You hypocrite, remove the wooden beam from your eye first; then you will see clearly to remove the splinter from your brother’s eye. (Matthew 7:3–5)
Disclaimer: This post is not an argument for or against the recent remdesivir findings. Rather, it’s meant to help you better distinguish the importance of clinical findings from the quality of the evidence for or against those findings. These two often get conflated in the news — even by medical doctors and health experts!
Technical Disclaimer: We’ll analyze the time to recovery as a continuous variable for simplicity, though a time-to-event / survival analysis is more appropriate.
We would overstate our health app’s effectiveness by claiming it reduces the risk of new coronavirus infections by 16.9% — when in fact it will only reduce this risk by 3.1%.
But we can re-weight our real-world evidence results to provide more accurate risk-reduction estimates of either 2.3% or 2.2%.
To review from Part 1 of this two-part tutorial using synthetic data:
Our analysis goal will be to help public health authorities in our simulated world reduce SARS-CoV-2 (“coronavirus”) infections. We believe our digital health or telemedicine app can help prevent new infections; e.g., by promoting healthy lifestyle choices —…
We would overstate our telemedicine app’s effectiveness by claiming it reduces the risk of new coronavirus infections by 16.9% — when in fact it will only reduce this risk by 3.1%.
Consider a timely hypothetical twist on a classic example of spurious correlation: Recently, ice cream sales have been dropping — along with the number of homicides. But this isn’t because eating ice cream drives people to murder. It’s because a community-wide shelter-in-place mandate was enacted to prevent the spread of a novel infectious agent. …
This is exactly the time to temper the sprinting agility of data science with the scientifically rigorous methodology of biostatistics.
Epidemiologists, infectious-disease specialists in particular, are the ultimate domain experts in guiding data science solutions that model or otherwise analyze population-level health-related aspects of SARS-CoV-2 (“coronavirus”) and its health impacts (i.e., COVID-19 characteristics and effects). As such, it is encouraging to see more and more data science hackathons and projects that correctly recognize the need to recruit epidemiologists to guide solution development. …