Free EMR Newsletter Want to receive the latest news on EMR, Meaningful Use, ARRA and Healthcare IT sent straight to your email? Join thousands of healthcare pros who subscribe to EMR and EHR for FREE!

Correlations and Research Results: Do They Match Up? (Part 2 of 2)

Posted on May 27, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

The previous part of this article described the benefits of big data analysis, along with some of the formal, inherent risks of using it. We’ll go even more into the problems of real-life use now.

More hidden bias

Jeffrey Skopek pointed out that correlations can perpetuate bias as much as they undermine it. Everything in data analysis is affected by bias, ranging from what we choose to examine and what data we collect to who participates, what tests we run, and how we interpret results.

The potential for seemingly objective data analysis to create (or at least perpetuate) discrimination on the basis of race and other criteria was highlighted recently by a Bloomberg article on Amazon Price deliveries. Nobody thinks that any Amazon.com manager anywhere said, “Let’s not deliver Amazon Prime packages to black neighborhoods.” But that was the natural outcome of depending on data about purchases, incomes, or whatever other data was crunched by the company to produce decisions about deliveries. (Amazon.com quickly promised to eliminate the disparity.)

At the conference, Sarah Malanga went over the comparable disparities and harms that big data can cause in health care. Think of all the ways modern researchers interact with potential subjects over mobile devices, and how much data is collected from such devices for data analytics. Such data is used to recruit subjects, to design studies, to check compliance with treatment, and for epidemiology and the new Precision Medicine movement.

In all the same ways that the old, the young, the poor, the rural, ethnic minorities, and women can be left out of commerce, they can be left out of health data as well–with even worse impacts on their lives. Malanga reeled out some statistics:

  • 20% of Americans don’t go on the Internet at all.

  • 57% of African-Americans don’t have Internet connections at home.

  • 70% of Americans over 65 don’t have a smart phone.

Those are just examples of ways that collecting data may miss important populations. Often, those populations are sicker than the people we reach with big data, so they need more help while receiving less.

The use of electronic health records, too, is still limited to certain populations in certain regions. Thus, some patients may take a lot of medications but not have “medication histories” available to research. Ameet Sarpatwari said that the exclusion of some populations from research make post-approval research even more important; there we can find correlations that were missed during trials.

A crucial source of well-balanced health data is the All Payer Claims Databases that 18 states have set up to collect data across the board. But a glitch in employment law, highlighted by Carmel Shachar, releases self-funding employers from sending their health data to the databases. This will most likely take a fix from Congress. Unless they do so, researchers and public health will lack the comprehensive data they need to improve health outcomes, and the 12 states that have started their own APCD projects may abandon them.

Other rectifications cited by Malanga include an NIH requirement for studies funded by it to include women and minorities–a requirement Malanga would like other funders to adopt–and the FCC’s Lifeline program, which helps more low-income people get phone and Internet connections.

A recent article at the popular TechCrunch technology site suggests that the inscrutability of big data analytics is intrinsic to artificial intelligence. We must understand where computers outstrip our intuitive ability to understand correlations.

Correlations and Research Results: Do They Match Up? (Part 1 of 2)

Posted on May 26, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

Eight years ago, a widely discussed issue of WIRED Magazine proclaimed cockily that current methods of scientific inquiry, dating back to Galileo, were becoming obsolete in the age of big data. Running controlled experiments on limited samples just have too many limitations and take too long. Instead, we will take any data we have conveniently at hand–purchasing habits for consumers, cell phone records for everybody, Internet-of-Things data generated in the natural world–and run statistical methods over them to find correlations.

Correlations were spotlighted at the annual conference of the Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics at Harvard Law School. Although the speakers expressed a healthy respect for big data techniques, they pinpointed their limitations and affirmed the need for human intelligence in choosing what to research, as well as how to use the results.

Petrie-Flom annual 2016 conference

Petrie-Flom annual 2016 conference

A word from our administration

A new White House report also warns that “it is a mistake to assume [that big data techniques] are objective simply because they are data-driven.” The report highlights the risks of inherent discrimination in the use of big data, including:

  • Incomplete and incorrect data (particularly common in credit rating scores)

  • “Unintentional perpetuation and promotion of historical biases,”

  • Poorly designed algorithmic matches

  • “Personaliziaton and recommendation services that narrow instead of expand user options”

  • Assuming that correlation means causation

The report recommends “bias mitigation” (page 10) and “algorithmic systems accountability” (page 23) to overcome some of these distortions, and refers to a larger FTC report that lays out the legal terrain.

Like the WIRED articles mentioned earlier, this gives us some background for discussions of big data in health care.

Putting the promise of analytical research under the microscope

Conference speaker Tal Zarsky offered both fulsome praise and specific cautions regarding correlations. As the WIRED Magazine issue suggested, modern big data analysis can find new correlations between genetics, disease, cures, and side effects. The analysis can find them much cheaper and faster than randomized clinical trials. This can lead to more cures, and has the other salutory effect of opening a way for small, minimally funded start-up companies to enter health care. Jeffrey Senger even suggested that, if analytics such as those used by IBM Watson are good enough, doing diagnoses without them may constitute malpractice.

W. Nicholson Price, II focused on the danger of the FDA placing too many strict limits on the use of big data for developing drugs and other treatments. Instead of making data analysts back up everything with expensive, time-consuming clinical trials, he suggested that the FDA could set up models for the proper use of analytics and check that tools and practices meet requirements.

One of exciting impacts of correlations is that they bypass our assumptions and can uncover associations we never would have expected. The poster child for this effect is the notorious beer-and-diapers connection found by one retailer. This story has many nuances that tend to get lost in the retelling, but perhaps the most important point to note is that a retailer can depend on a correlation without having to ascertain the cause. In health, we feel much more comfortable knowing the cause of the correlation. Price called this aspect of big data search “black box” medicine.” Saying that something works, without knowing why, raises a whole list of ethical concerns.

A correlation stomach pain and disease can’t tell us whether the stomach pain led to the disease, the disease caused the stomach pain, or both are symptoms of a third underlying condition. Causation can make a big difference in health care. It can warn us to avoid a treatment that works 90% of the time (we’d like to know who the other 10% of patients are before they get a treatment that fails). It can help uncover side effects and other long-term effects–and perhaps valuable off-label uses as well.

Zarsky laid out several reasons why a correlation might be wrong.

  • It may reflect errors in the collected data. Good statisticians control for error through techniques such as discarding outliers, but if the original data contains enough apples, the barrel will go rotten.

  • Even if the correlation is accurate for the collected data, it may not be accurate in the larger population. The correlation could be a fluke, or the statistical sample could be unrepresentative of the larger world.

Zarsky suggests using correlations as a starting point for research, but backing them up by further randomized trials or by mathematical proofs that the correlation is correct.

Isaac Kohane described, from the clinical side, some of the pros and cons of using big data. For instance, data collection helps us see that choosing a gender for intersex patients right after birth produces a huge amount of misery, because the doctor guesses wrong half the time. However, he also cited times when data collection can be confusing for the reasons listed by Zarsky and others.

Senger pointed out that after drugs and medical devices are released into the field, data collected on patients can teach developers more about risks and benefits. But this also runs into the classic risks of big data. For instance, if a patient dies, did the drug or device contribute to death? Or did he just succumb to other causes?

We already have enough to make us puzzle over whether we can use big data at all–but there’s still more, as the next part of this article will describe.

MU Stats, “Cadillac” EMR, EHR Patent, and Big Data

Posted on November 4, 2012 I Written By

John Lynn is the Founder of the HealthcareScene.com blog network which currently consists of 10 blogs containing over 8000 articles with John having written over 4000 of the articles himself. These EMR and Healthcare IT related articles have been viewed over 16 million times. John also manages Healthcare IT Central and Healthcare IT Today, the leading career Health IT job board and blog. John is co-founder of InfluentialNetworks.com and Physia.com. John is highly involved in social media, and in addition to his blogs can also be found on Twitter: @techguy and @ehrandhit and LinkedIn.


Some really interesting stats for meaningful use. I think I’d seen most of them in one place or another, but it was great to see them all in once place. Nice work Fred.


Cadillac of EHR. Very interesting.


This is really annoying for me. I haven’t written much on these blogs about why I don’t like software patents, but I’ll have to in the future. You can also read this piece by Anne Zieger about mHealth patents. Software patents are such terrible innovation inhibitors which is ironic since it’s the opposite of what they were designed to accomplish.


I can’t wait until this convergence is normal. It will usher in the start of Smart EMR.

EMR Interfaces Gone Wrong, Or The Tale Of The Albanian Patient

Posted on October 16, 2012 I Written By

Anne Zieger is veteran healthcare consultant and analyst with 20 years of industry experience. Zieger formerly served as editor-in-chief of FierceHealthcare.com and her commentaries have appeared in dozens of international business publications, including Forbes, Business Week and Information Week. She has also contributed content to hundreds of healthcare and health IT organizations, including several Fortune 500 companies. Contact her at @ziegerhealth on Twitter or visit her site at Zieger Healthcare.

Today, for your consideration, we have the tale of the Albanian patient who wasn’t Albanian.  More broadly, I’m here to discuss the perils of adding an extra interface consideration to the workflow of busy EMR users, and the impact that has on data quality.

Scope, a blog published by the Stanford School of Medicine, shares the case of the Merced County, California physician who, exasperated with the requirement that he identify the ethnicity of each patient, chooses “Albanian” for all of them. Why? Simply because “Albanian” is the first item of the rather long list in the pulldown menu.

As a result of this interface issue, any attempt to mine this veteran doctor’s data for population health analysis is weakened, writes Anna Lembke, MD, asssistant professor of psychiatry and behavioral sciences at Stanford.  And this physician’s choices should give the “big data” users pause, she suggests:

Misinformation in electronic medical records, whether accidental or otherwise, has far-reaching consequences for patients and health care policy, because electronic medical records are being actively ‘data-mined’ by large health care conglomerates and the government as a basis for improving care. This is an important downside to consider as we move forward.

Dr. Lembke’s observations are important ones. If government entities and health organizations would like to mine the increasingly large pools of data EMRs are collecting, it’s important to look at whether the data collected actually reflects the care being given and the patients being seen.

I’m not suggesting that we audit clinicians’ efforts wholesale — they’d rightfully find it offensively intrusive — but I am suggesting that we audit the interfaces themselves from time to time.  Even a quarterly review of the interfaces and workflow an EMR demands, and results it produces, might help make sure that the data actually reflects reality.

Swimming in Too Much EMR Data

Posted on May 31, 2012 I Written By

As Social Marketing Director at Billian, Jennifer Dennard is responsible for the continuing development and implementation of the company's social media strategies for Billian's HealthDATA and Porter Research. She is a regular contributor to a number of healthcare blogs and currently manages social marketing channels for the Health IT Leadership Summit and Technology Association of Georgia’s Health Society. You can find her on Twitter @JennDennard.

I don’t know about you, but the long holiday weekend was far too short for me. The majority of my family’s time was spent kicking off summer at various pools (with the appropriate sunblock, of course). Pools and swimming are somewhat second nature to me. The smell of chlorine takes me back to my high school and early college days of year round swim team, coaching summer swim league and sitting in a lifeguard chair in the brutal heat, whistle dangling around my neck.

As we gear up for my oldest daughter’s first summer swim meet this week (picking the appropriate swim cap, finding those goggles that fit just right and painting our toes the appropriate team color), I’m hoping that she’ll come to love the sights, sounds and smells of the pool as well. She certainly seemed to enjoy herself at one of the Memorial Day weekend pool parties we attended.

One family affair in particular found me wading into a conversation about Salesforce.com. Turns out a soon-to-be new member of the family works for the company, and I told him that, as part of my day job, I had been dabbling in using it. He quickly asked me about my likes and dislikes, at which point his fiancé chimed in with the lament that yes, Salesforce is an awesome tool, but more often than not, sales team do not have the time (and in some cases the inclination or training) to fully make use of all its bells and whistles.

I pondered her statement a bit further as I watched my daughter practice swimming with her new flippers, and realized that those of us that use SaaS (software as a service) technologies – like electronic medical records – tend to have the same complaint. Bells and whistles are great, but if I never have the time to learn to use them effectively to accomplish goals specific to my tasks, then I’m not going to use them at all. And I’m never going to pay much attention to the constant updates and add-ons these sorts of technologies usually come with.

I wonder if some EMR end-users feel the same way. They love the idea behind the technology, and certainly the government incentives that typically come along with using it, but after implementation find themselves with only enough time to utilize the EMR’s basic functions. I’d assume this might be a bigger problem for private practice physicians than for those working within a hospital.

I’m certainly not the first to ponder the relationship between Salesforce and EMRs. Our fearless leader John Lynn wrote about Practice Fusion building a personal health record on top of Salesforce way back in 2009, seemingly not long after Salesforce invested in the HIT company.

What I’m talking about, however, is the amount of time and energy required to truly take advantage of the vast oceans of meaningful data that can be culled from an EMR. Big data is great. Lord knows we’ve all been convinced of the value of that and the business intelligence tools that help us decipher it. I’d be interested to hear from doctors that have pondered the same thing. Are providers swimming in too much EMR information? Are they faced with more than they could ever possibly utilize? Does it come down to user experience and user-centric design?

Let me know what you think in the comments below. In the meantime, I’ll be helping my daughter perfect her backstroke.