Understanding Personal Health Data: Not All Bits Are the Same (Part 3 of 4, Government Information)

Andy Oram is an editor at O'Reilly Media.

Previous segments of this article (parts 1 and 2) have explored the special characteristics of various types of data shared on the Internet. This one will look at one more type of data before we turn to the health care field.

Government Information

Governments generate data during their routine activities, often in wild and unstructured ways. They have exploited this data for a long time, as some friends of mine found out a good 35 years ago when they started receiving promotions for wedding registries from local companies. They decided that the only way those companies could know they were getting married is from the town where they obtained their marriage license.

Government data offers many less exploitative uses, however; it forms a whole discipline of its own explored by such groups as the Governance Lab and the Personal Democracy Forum. Governments open data on transportation, bills and regulations, and crime and enforcement, among other things, to promote civic engagement and new businesses.

The value of such data comes from its reliability. Therefore, data that is inconsistently collected, poorly coded, outdated, or inordinately redacted reduces public confidence. Such lapses are all too common, even on the U.S. government’s celebrated data.gov site.

Joel Gurin, president and founder of the Center for Open Data Enterprise, told me that some of the most advanced federal agencies in the open data area — the Departments of Health and Human Services, Energy, Transportation, and Commerce — provide better access to their records on their own sites than on data.gov. The latter is not set up as well for finding data or getting information about its provenance, meaning, and use.

Some government data requires protection because it contains sensitive personal information. Legal battles often arise regarding whether data should be released on elected officials and employees — for instance, on police officers who were arrested for drunk driving — because the privacy rights of the official clash with the public’s right to know. De-identification is not always done properly, or succumbs to later re-identification efforts. And data can be misleading in the cases where analysts and journalists don’t understand the constraints around data collection. In addition, protection is currently decided on a rather arbitrary basis, and varies wildly from jurisdiction to jurisdiction.

For a long-range perspective on government data quality, I talked to Stefaan G. Verhulst, co-founder and Chief Research and Development Officer of the Governance Laboratory at NYU. He said, “The question is whether a government should only share data that is of high value and high quality, or whether we can benefit from a hybrid approach where the market addresses some of the current weaknesses of data. A site such as data.gov represents a long tail: some data may be of value only to a tiny set of people, but they may be willing to invest money in extracting the data from formats and repositories that are less than optimal. And hopefully, weaknesses will be rectified at the source by governments over time.”

Gurin, in his book Open Data Now (which I reviewed), calls for government outreach and partnerships with stakeholders, such as businesses that can capitalize on open data. Such partnerships would help decide what data to release and where to put resources to improve data.

One gets interesting results when asking who owns government data. The obvious answer is that it belongs to the taxpayers who paid for its collection, and by extension (because restricting it to taxpayers is unfeasible) to the public as a whole.

Nonetheless, many foreign countries and local U.S. governments copyright data. Access to such data is prohibitively expensive. Even when information is supposedly in the public domain, obscure data formats make it hard to retrieve online, and government agencies throughout the U.S. often charge exorbitant fees to people who obtain data, even when requests are granted under the Freedom of Information Act. Recent low points include resistance in Massachusetts to reforming the worst public record policies in the country, and the bizarre persecution of open government advocates by Georgia and Oregon. Unfortunately, the idea that government data should be open to all is intuitive, but far from universally accepted.

Now we have looked at four types of data in a series of articles; the next one will bring the focus back to health care.