Free EMR Newsletter Want to receive the latest news on EMR, Meaningful Use, ARRA and Healthcare IT sent straight to your email? Join thousands of healthcare pros who subscribe to EMR and EHR for FREE!

Understanding Personal Health Data: Not All Bits Are the Same (Part 4 of 4, Personal Health Data)

Posted on October 1, 2015 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

Previous segments of this article explained what makes data sharing difficult in four major areas of Internet data: money, personal data, media content, and government information. Now it’s time to draw some lessons for the health care field.

Personal Health Data

So let’s look now at our health data. It’s clearly sensitive, because disclosure can lead to discrimination by employees and insurers, as well as ostracism by the general public. For instance, an interesting article highlights the prejudice faced by recovering opiate addicts among friends and co-workers if they dare to reveal that they are successfully dealing with their addition.

The value of personal health data is caught up with our very lives. We cannot change our diagnoses or genetic predispositions to disease the way we can change our bank accounts or credit cards. At the same time, whatever information we can provide about ourselves is of immense value to researchers who are trying to solve the health conditions we suffer from.

So we can assume that health data has an enhanced value and requires more protection than other types of personal data.

Currently, we rarely control our data. Anything we tell doctors is owned by them. HIPAA strictly controls the sharing of such data (especially as it was clarified a couple years ago in the handling of third parties known as “business associates”). But doctors have many ways to deny us access to our own data. One of my family members goes to a doctor who committed the sin of changing practices. We had to pay the old practice to transfer records to the new practice. (I have written about problems with interoperability and data exchange in many other contexts, including blog posts about the Health Datapalooza and the HIT Standards Committee. Data exchange problems hinder research, big data inquiries, and clinical interventions.)

A doctor might well claim, “Why shouldn’t I own that data? Didn’t I do the exam? Didn’t I order the test whose data is now in the record?” Using that logic, the doctor should grant the lab ownership of the test. Now that patients can order their own medical tests (at least in Arizona), how does this dynamic around ownership change? And as more and more patients collect data on themselves using things such as the Apple Watch, network-connected scales, and fitness devices — data that may contain inaccuracies but is still useful for understanding people’s behavior and health status — how does this affect the power balance between a patient and the healthcare provider, or a researcher pursuing a clinical trial?

It’s also interesting to note that although HIPAA covers data collected by people who treat us and insurers who pay for the treatment, it has no impact on data in other settings. In particular, anything we choose to share online joins the enormous stream of public data without restrictions on use.

And it’s disturbing how freely data can be shared with marketers. For instance, when Vermont tried to restrain pharmacies from selling data about prescriptions to marketers, it was overruled by the U.S Supreme Court. The court took it for granted that pharmacies would adequately de-identify patients, but this is by no means assured.

What are the competing priorities, then, about protection of health data? On the research side — where data can really help patients by finding cures or palliative measures — pressures are increasing to loosen our personal control over data. Laws and regulations are being amended to override the usual restrictions placed on researchers for the reuse of patient data.

The argument for reform is that researchers often find new uses for old data, and that the effort of contacting patients and getting permission to reuse the data impose prohibitive expenses on researchers.

Certainly, I would get annoyed to be asked every week to approve the particular reuse of my personal data. But I’d rather be asked than have my preferences overridden. In the Internet age, I find it ridiculous to argue that researchers would be overly burdened to request access to data for new uses.

A number of efforts have been launched to give researchers a general, transferable consent to patient data. Supposedly, the patient would grant a general release of data for some class of research at the beginning of data collection. But these efforts have all come to naught. Remember that a patient is often asked for consent to release data at a very tense moment — just after being diagnosed with a serious disease or while on the verge of starting a difficult treatment regimen. Furthermore, the task of designing a general class of research is a major semantic issue. It would require formalizing in software what the patient does and does not allow — and no one has solved that problem.

How, then, do I suggest resolving the question of how we should handle patient data? First, patients need to control all data about themselves. All clinicians, pharmacies, labs, and other institutions exist to serve patients and support their health. They can certainly validate data — for instance, by providing digital signatures indicating the diagnoses, test results, and other information are accurate — but they do not own the data.

A look at how we’re protecting money on the Internet may help us understand the urgency of protecting health data: storing it securely, encrypting it, and making outside organizations jump through hoops to access it.

Ownership of patient data is currently as murky as personal data of other types, HIPAA notwithstanding. We can use many of the same arguments and concepts for health data that we’ve seen for other personal data. As with government data, we can hold interesting discussions about how much difference anonymization makes to ownership — do you have no right to restrict the use of your health or government data once it is supposedly anonymized?

Dr. Adrian Gropper, CTO of Patient Privacy Rights, says that the concept of “ownership” is not helpful for patient data. It is better in terms of both law and computer science to speak of authorization: who can look at the data and who grants the right to look at it. Gropper works on the open source HEART WG project, which is creating an OAuth-based system to support patient control, and which he and I have written about on the Radar site.

The corollary of this principle is that patients need repositories for their data that are easy to manage. HEART WG can tie together data in different repositories — the patient’s, the clinicians’, and others — and control the flow from one repository to another.

Finally, researchers must contact patients to explain how their data will be used and to request permission. With Internet tools, this should not be onerous for the researcher or the patient. Hey, everybody in medicine nowadays touts “patient engagement.” One is likely to get better data if one engages. So, let’s do it. And that way we can avoid the uncertain protection of anonymization or de-identification, which degrades patient data in order to render it harder to track back to an individual.

Researchers worry about request fatigue if individuals have to respond to every request manually, although I see this as a great opportunity for research projects to explain their goals and drum up public support. A number of organizations are trying to design systems to let individuals approve use of their data in advance, and I wish them the best, but all such attempts have shipwrecked on two unforgiving shoals. First is the impossibility of anticipating new research and the radically different directions it can take. Second is the trap of ontologies: who can define a useful concept such as “non-profit research” in terms strict enough to be written into computer programs? And how will the health care world agree on representations of the ontologies and produce perfectly interoperable computer programs to automate consent?

Value, ownership, and protection are difficult questions on an Internet that was designed in the 1960s and 1970s as a loose, open platform. We can fill the gaps through policy measures and technical protections based on well-grounded principles. Patients care about their data and its privacy. We can give them the control they crave and deserve.

Understanding Personal Health Data: Not All Bits Are the Same (Part 3 of 4, Government Information)

Posted on September 30, 2015 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

Previous segments of this article (parts 1 and 2) have explored the special characteristics of various types of data shared on the Internet. This one will look at one more type of data before we turn to the health care field.

Government Information

Governments generate data during their routine activities, often in wild and unstructured ways. They have exploited this data for a long time, as some friends of mine found out a good 35 years ago when they started receiving promotions for wedding registries from local companies. They decided that the only way those companies could know they were getting married is from the town where they obtained their marriage license.

Government data offers many less exploitative uses, however; it forms a whole discipline of its own explored by such groups as the Governance Lab and the Personal Democracy Forum. Governments open data on transportation, bills and regulations, and crime and enforcement, among other things, to promote civic engagement and new businesses.

The value of such data comes from its reliability. Therefore, data that is inconsistently collected, poorly coded, outdated, or inordinately redacted reduces public confidence. Such lapses are all too common, even on the U.S. government’s celebrated data.gov site.

Joel Gurin, president and founder of the Center for Open Data Enterprise, told me that some of the most advanced federal agencies in the open data area — the Departments of Health and Human Services, Energy, Transportation, and Commerce — provide better access to their records on their own sites than on data.gov. The latter is not set up as well for finding data or getting information about its provenance, meaning, and use.

Some government data requires protection because it contains sensitive personal information. Legal battles often arise regarding whether data should be released on elected officials and employees — for instance, on police officers who were arrested for drunk driving — because the privacy rights of the official clash with the public’s right to know. De-identification is not always done properly, or succumbs to later re-identification efforts. And data can be misleading in the cases where analysts and journalists don’t understand the constraints around data collection. In addition, protection is currently decided on a rather arbitrary basis, and varies wildly from jurisdiction to jurisdiction.

For a long-range perspective on government data quality, I talked to Stefaan G. Verhulst, co-founder and Chief Research and Development Officer of the Governance Laboratory at NYU. He said, “The question is whether a government should only share data that is of high value and high quality, or whether we can benefit from a hybrid approach where the market addresses some of the current weaknesses of data. A site such as data.gov represents a long tail: some data may be of value only to a tiny set of people, but they may be willing to invest money in extracting the data from formats and repositories that are less than optimal. And hopefully, weaknesses will be rectified at the source by governments over time.”

Gurin, in his book Open Data Now (which I reviewed), calls for government outreach and partnerships with stakeholders, such as businesses that can capitalize on open data. Such partnerships would help decide what data to release and where to put resources to improve data.

One gets interesting results when asking who owns government data. The obvious answer is that it belongs to the taxpayers who paid for its collection, and by extension (because restricting it to taxpayers is unfeasible) to the public as a whole.

Nonetheless, many foreign countries and local U.S. governments copyright data. Access to such data is prohibitively expensive. Even when information is supposedly in the public domain, obscure data formats make it hard to retrieve online, and government agencies throughout the U.S. often charge exorbitant fees to people who obtain data, even when requests are granted under the Freedom of Information Act. Recent low points include resistance in Massachusetts to reforming the worst public record policies in the country, and the bizarre persecution of open government advocates by Georgia and Oregon. Unfortunately, the idea that government data should be open to all is intuitive, but far from universally accepted.

Now we have looked at four types of data in a series of articles; the next one will bring the focus back to health care.

Understanding Personal Health Data: Not All Bits Are the Same (Part 2 of 4, Personal Data and Media Content)

Posted on September 29, 2015 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

The previous segment of this article introduced the notion that many types of data on the Internet, including personal health data, come entangled with constraints on how we can store, share, and use it. I’ll examine two more types of data–personal data and media content–in this article, and government information in the next.

Personal Data

The photos, status updates, hotel reviews, and other personal postings we upload daily constitute a huge repository of data, along with a huge market. This section talks about the melange of information that determined seekers can find about us online: usually things we voluntarily offer through Facebook, Instagram, etc., but also things that others say about us and “data exhaust” generated by our purchases and other activity that companies and governments track. When we go online, we tend to present the sides of ourselves we would like others to know about–but we don’t always understand what we’re revealing about our predilections, prejudices, and drives.

A 2012 McKinsey report suggests that social technologies offer anywhere from $900 billion to $1.3 trillion in annual value — and that’s just counting four industries (page 9 of the report).

So our personal data clearly has value. However, there are qualifications to this value. The problem is that no one is tasked with making sure the information is correct. People enter lies and distorted versions of their life events to social networks all the time. Marketers and other data-slurping companies hope that the inaccuracies work themselves out during big-data processing. But that assumes that the truth lies in there somewhere (a dubious proposition) and that sophisticated data mining techniques can eliminate inaccurate outliers.

Ownership is a curious and fascinating question for personal data. Do you “own” the data item indicating that you just purchased a shirt from Everlane? Proponents of vendor relationship management would say yes. These Internet reformers would like consumers to be in charge of the data related to their transactions, and would like companies that want to use such data for marketing or planning to pay customers. Others would argue that Everlane has just as much a right to the data as you do — you are both parties to a transaction.

As I have indicated elsewhere, ownership is a slippery concept, even when you generate it yourself. When I take photos of friends, they often ask me not to post the pictures to Facebook. I respect this, treating them as owners of their digital images. It’s interesting, incidentally, that this question of intrusive photo-taking underlies the seminal work on privacy: the 1890 Harvard Law Review article by Warren and Brandeis.

Currently, ownership is something of a Wild West where anyone who gets your personal data can use it, unless you have explicitly put it under license. So protection — the third trait of Internet data I address throughout this article — is weak and oft trampled on in personal data. I think we all want to protect personal health data from this situation, a theme I’ll return to when we get to that section of the article.

Media Content

Because I work for a publisher — and one particularly prescient in its adaptation to the wired world — I have participated in many discussions of media content. I’m talking here of things that aren’t just thrown on the open Internet, like articles on this Radar blog, but are hidden behind walls that you can enter only after paying, or at least by entering an email address and some personal information such as the size and industry of your company. Your email address is tremendously useful to the company providing the content, whether they use it to shove ads at you, sell information to vendors, or determine what future content to produce.

Is media content valuable? Certainly it is, thanks to the years of expertise and hours of effort invested by those who created and curated it. Note that in the previous section, I cited a McKinsey report. I didn’t spend hours vetting the report or checking McKinsey’s credentials. I relied on their reputation as a key source of information in the tech industry — an example of the value created by trusted content sites.

This confirms the dictum that information on the Internet wants to be expensive, as famously said by Stewart Brand. That’s why many people spend good money to access news sites and online books, and other people go to great efforts to get it for free.

The question of ownership is resolved by copyright law, but in ways that are not entirely compatible with the Internet. For instance, many researchers would often love to share their papers with all who want them, but the publishers usually own the content and place restrictions on such sharing. Luckily, many academic publishers now allow authors to place early pre-publication drafts online for free download. I can locate a free copy of most research articles by entering the title and author names into a search engine.

Indeed, when we talk about “owning” data, we fall into a trap prepared by large corporate interests who depend upon notions of Intellectual Property to maintain their income flows. I am not opposed to the exercise of copyrights, patents, and trademarks, but I worry about the extension of these carefully defined concepts to a larger context where casual references to property and (as a consequence) ownership in are at best unhelpful and at worst meaningless.

Protection is also a controversial topic hre. Many publishers (but definitely not my company, O’Reilly Media) take extraordinary efforts to protect data, notably digital rights management, which I cover in other articles. It’s notable that no laws restrict you from downloading software from the Internet to make a gun, but severe laws punish not just downloading copyrighted content, but offering tools that let people break the digital rights management on that content.

Further segments of this article will continue to explore Internet information and its meaning for the health care field.

Understanding Personal Health Data: Not All Bits Are the Same (Part 1 of 4)

Posted on September 28, 2015 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

When people run out of new things to say in the field of health IT, they utter the canard, “Why can’t exchanging patient data be as easy as downloading a file on the Internet?” For a long time, I was equally smitten by the notion of seamless exchange, which underlies the goals of accountable care, patient-centered medical homes, big data research, and the Precision Medicine Initiative so dear to the White House. Then I began to notice that patient information differs in deep ways from arbitrary data on the Internet.

Personal health data isn’t alone in having special characteristics that make handling it fraught with dangers and complications. In this article, I’ll look at several other types of online data laden with complexity — money, personal data, media content, and government information — and draw some conclusions for how we might handle health data.

Money

I am not an early adopter by habit, even though I work in high tech. When someone announces, “Now you can pay your bills using your phone!” it sounds to me like “Now you can mow your lawn using your violin!” Certain things just don’t go together naturally. Money is not like other bits; you can’t copy it the way people casually share their photos or email messages.

Of course I endorse the idea of online payment systems. They have transformed the economies of rural communities in underdeveloped parts of the world like sub-Saharan Africa. They can be useful in the U.S. for people who can’t get credit cards or even checking accounts.

Perhaps that’s why there are at least 235 (as of the time of publication) online payment systems. But money isn’t a casual commodity. It requires coordination and control. Even the ballyhooed Bitcoin system needs checks and balances. Famously described as decentralized because many uncoordinated systems create the coins and individuals store their own, Bitcoin-like systems are actually heavily centralized around the blockchain they hold in common.

Furthermore, most people don’t feel safe storing large quantities of bitcoins on personal servers, so they end up using centralized exchanges, which in turn suffer serious security breaches, as happened to Mt. Gox and Bitstamp.

So let’s look at some special aspects of money as data.

First, money has value. Ultimately — as we have seen in the crisis of the Euro and the narrowly averted default by Greece — money’s value comes from guarantees by banks, including countries’ central banks. Money’s value is increased by the importance placed on it by the people that want to steal it from us or cheat us out of it.

Second, money has an owner. In fact, I can’t imagine money without an owner. It would be like gold bullion buried on a desert island, contributing nothing to the world economy. So, the Internet culture of sharing has no meaning for money.

Third, money must be protected. Most of us — who can — use credit cards, because they are backed by complex systems for detecting theft and fraud run by multinational corporations who can indemnify us and handle our mishaps. If we store our money outside the banking system, we lack these protections.

These three traits — value, ownership, and protection — will turn up again in each of the types of Internet content I’ll look at in upcoming installments of this article. Does a review of money on the Internet help us assess health data? Comparisons are shaky, because they are very different. But because health data is so sensitive, we might learn a lot about its protection by paying attention to how money is handled.