The following is a guest blog post by Andy Oram, writer and editor at O’Reilly Media.
For more than a century, doctors have put their faith in randomized, double-blind clinical trials. But this temple is being shaken to its foundations while radical sects of “big data” analysts challenge its orthodoxy. The schism came to a head earlier this month at the Health Datapalooza, the main conference covering the use of data in health care.
The themes of the conference–open data sets, statistical analysis, data sharing, and patient control over research–represent an implicit challenge to double-blind trials at every step of the way. Whereas trials recruit individuals using stringent critirea, ensuring proper matches, big data slurps in characteristics from everybody. Whereas trials march through rigid stages with niggling oversight, big data shoots files through a Hadoop computing cluster and spits out claims. Whereas trials scrupulously separate patients, big data analysis often draws on communities of people sharing ideas freely.
This year, the tension between clinical trials and big data was unmistakeable. One session was even called “Is the Randomized Clinical Trial (RCT) Dead?”
The background to the session is just as important as the points raised during the session. Basically, randomized trials have taken it on the chin for the past few years. Most have been shown to be unreproducible. Others have been repressed because they don’t show the results that their funders (usually pharmaceutical companies) would like to see. Scandals sometimes reach heights of absurdity that even a satirical novelist would have trouble matching.
We know that the subjects recruited to RCTs are unrepresentative of most people who receive treatments based on results. The subjects tend to be healthier (no comordities), younger, whiter, and more male than the general population. At the Datapalooza session, Robert Kaplan of NIH pointed out that a large number of clinical trials recruit patients from academic settings, even though only 1 in 100 of people suffering from a condition gets treated in such settings. He also pointed out that, since the federal government require clinical trials to register a few years ago, it has become clear that most don’t produce statistically significant results.
Two speakers from the Oak Ridge National Laboratory pushed the benefits of big data even further. Georgia Tourassi claimed that so far as data is concerned, “bigger can be better” even if the dat is “unusual, noisy, or sparse.” She suggested, however, that data analysis has roles to play before and after RCTs–on the one side, for instance, to generate hypotheses, and on the other to conduct longitudinal studies. Mallikarjun Shankar pointed out that we use big data successful in areas where randomized trials aren’t available, noticeably in enforcing test ban treaties and modeling climate change.
Robert Temple of the FDA came to the podium to defend RCTs. He opined that trials are required for clinical effectiveness–although I thought one of his examples undermined his claim–and pointed out that big data can have trouble finding important but small differences in populations. For example, an analysis of widely varying patients might miss the difference between two drugs, which may cause adverse effects in only 3% versus 4% of the population respectively. But for the people who suffer the adverse effects, that’s a 25% difference–something they’d like to know about.
RCTs received a battering in other parts of the Datapalooza as well, particularly in the keynote by Vinod Khosla, who has famously suggested that computing can replace doctors. While repeating the familiar statistics about the failures of RCTs, he waxed enthusiastic about the potential of big data to fix our ills. In his scenario, we will all collect large data sets about ourselves and compare them to other people to self-diagnose. Kathleen Sebelius, keynoting at the Datapalooza in one of her last acts as Secretary of Health and Human Services, said “We’ve been making health policy in this country for years based on anecdote, not information.”
Less present at the Datapalooza was the idea that there are ways to improve clinical trials. I have reported extensively on efforts at reform, which include getting patients involved in the goals and planning of trials, sharing raw data sets as well as published results, and creating teams that cross multiple organizations. The NIH is rightly proud of their open access policy, which requires publicly funded research to be published for free download at PubMed. But this policy doesn’t go far enough: it leaves a one-year gap after publication, which may itself take place a year after the paper was written, and the policy says nothing about the data used by the researcher.
I believe data analysis has many secrets to unlock in the universe, but its effectiveness in many areas is unproven. One may find a correlation between a certain gene and an effective treatment, but we still don’t know what other elements of the body have an impact. RCTs also have well tested rules for protecting patients that we need to explore and adapt to statistical analysis. It will be a long time before we know who is right, and I hope for a reconciliation along the way.