Is Claims Data Really So Bad For Health Care Analytics?

Posted on June 12, 2015 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

Two commonplaces heard in the health IT field are that the data in EHRs is aimed at billing, and that billing data is unreliable input to clinical decision support or other clinically related analytics. These statements form two premises to a syllogism for which you can fill in the conclusion. But at two conferences last week–the Health Datapalooza and the Health Privacy Summit–speakers indicated that smart analysis can derive a lot of value from claims data.

The Healthcare Cost and Utilization Project (HCUP), run by the government’s Agency for Healthcare Research and Quality (AHRQ), is based on hospital release data. Major elements include the payer, diagnoses, procedures, charges, length of stay, etc. along with potentially richer information such as patients’ ages, genders, and income levels. A separate Clinical Content Enhancement Toolkit does allow states to add clinical data, while American Hospital Association Linkage Files let hospitals upload data about their facilities.

But basically. HCUP data revolves around the claims from all-payer databases. It is collected currently from 47 states, and varies on a state-by-state basis depending on what data they allow to be released. HCUP goes back to 2006 and powers a lot of research, notably to improve outreach to underserved racial and ethnic groups.

During an interview at the Health Privacy Summit, Lucia Savage, Chief Privacy Officer at ONC, mentioned that one can use claims data to determine what treatments doctors offer for various conditions (such as mammograms, which tend to be underused, and antibiotics, which tend to be overused). Thus, analysts can target providers who fail to adhere to standards of care and theoretically improve outcomes.

M1, a large data analytics company serving a number of industries, bases a number of products in the health care space on claims data. For instance, medical device companies contract with M1 to find out which devices doctors are ordering. Insurance companies use it to sniff out fraud.

M1’s business model, incidentally, is a bit different from that pursued by most analytics organizations in the health care arena. Most firms contract with some institution–an insurer, for instance–to analyze its data and provide it with unique findings. But M1 goes around buying up data from multiple institutions and combining it for deeper insights. It then sells results back to these institutions, often paying out taking in payment from the same company.

In short, smart organizations are shelling out money for data about billing and claims. It looks like, if you have a lot of this data, you can reliably lower costs, improve marketing, and–most important of all–improve care. But we mustn’t lose sight of the serious limitations and weaknesses of this data.

  • A scandalously amount of it is clinical just wrong. Doctors “upcode” to extract the largest possible reimbursement for what they treat. A number of them go further and assign codes that have no justification whatsoever. And that doesn’t even count outright fraud, which reaches into the billions of dollars each year and therefore must leave a lot of bad data in the system.

  • Data is atomized, each claim standing on its own. A researcher will find it difficult to impossible (if patient identifiers are totally stripped out) to trace a sequence of visits that tell you about the progress of treatment.

  • Data is relatively impoverished. Clinical records flesh out the diagnosis with related conditions, demographic information, and other things that make the difference between correct and incorrect treatments.

But on the other hand, to go beyond billing data and reach the data utopia that reformers dream about, we’d have to slurp up a lot of complex and sensitive patient data. This has pitfalls of its own. Little clinical data is structured, and the doctors who do take the effort to enter it into structured fields do so inconsistently. Privacy concerns also raise their threatening heads when you get deep into patient conditions and demographics. So perhaps we should see how far we can get with claims data.