Randomized Clinical Trial Validates BaseHealth’s Predictive Analytics

Posted on March 11, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://radar.oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

One of the pressing concerns in health care is the validity of medical and health apps. Because health is a 24-hour-a-day, 365-day-a-year concern, people can theoretically overcome many of their health problems by employing apps that track, measure, report, and encourage them in good behavior. But which ones work? Doctors are understandably reluctant to recommend apps–and insurers to cover them–without validation.

So I’ve been looking at the scattered app developers who have managed to find the time and money for randomized clinical studies. One recent article covered two studies showing the value of a platform that provided the basis for Twine Health. Today I’ll look at BaseHealth, whose service and API I covered last year.

BaseHealth’s risk assessment platform is used by doctors and health coaches to create customized patient health plans. According to CEO Prakash Menon, “Five to seven people out of 1,000, for instance, will develop Type II diabetes each year. Our service allows a provider to focus on those five to seven.” The study that forms the basis for my article describes BaseHealth’s service as “based on an individual’s comprehensive information, including lifestyle, personal information, and family history; genetic information (genotyping or full genome sequencing data), if provided, is included for cumulative assessment.” (p. 1) BaseHealth has trouble integrating EHR data, because transport protocols have been standardized but semantics (what field is used to record each bit of information) have not.

BaseHealth analytics are based on clinical studies whose validity seems secure: they check, for instance, whether the studies are reproducible, whether their sample sizes are adequate, whether the proper statistical techniques were used, etc. To determine each patient’s risk, BaseHealth takes into account factors that the patient can’t control (such as family history) as well as factors that he can. These are all familiar: cholesterol, BMI, smoking, physical activity, etc.

Let’s turn to the study that I read for this article. The basic question the study tries to answer is, “How well does BaseHealth predict that a particular patient might develop a particular health condition?” This is not really feasible for a study, however, because the risk factors leading to diabetes or lung cancer can take decades to develop. So instead, the study’s authors took a shortcut: they asked interviewers to take family histories and other data that the authors called “life information” without telling the interviewers what conditions the patients had. Then they ran the BaseHealth analytics and compared results to the patients actual, current conditions based on their medical histories. They examined the success of risk assignment for three conditions: coronary artery disease (CAD), Type 2 diabetes (T2), and hypertension (HTN).

The patients chosen for the study had high degrees of illness: “43% of the patients had an established diagnosis of CAD, 22% with a diagnosis of T2D and 70% with a diagnosis of HTN.” BaseHealth identified even more patients as being at risk: 74.6% for CAD, 66.7% for T2D, and 77% for HTN. It makes sense that the BaseHealth predictions were greater than actual incidence of the diseases, because BaseHealth is warning of potential future disease as well.

BaseHealth assigned each patient to a percentile chance of getting the disease. For instance, some patients were considered 50-75% likely to develop CAD.

The study used 99 patients, 12 of whom had to be dropped from the study. Although a larger sample would be better, results were still impressive.

The study found a “robust correlation” between BaseHealth’s predictions and the patients’ medical histories. The higher the risk, the more BaseHealth was likely to match the actual medical history. Most important, BaseHealth had no false negatives. If it said a patient’s risk of developing a disease was less than 5%, the patient didn’t have the disease. This is important because you don’t want a filter to leave out any at-risk patients.

I have a number of questions about the article: how patients break down by age, race, and other demographics, for instance. There was also an intervention phase in the study: some patients took successful measures to reduce their risk factors. But the relationship of this intervention to BaseHealth, however, was not explored in the study.

Although not as good as a longitudinal study with a large patient base, the BaseHealth study should be useful to doctors and insurers. It shows that clinical research of apps is feasible. Menon says that a second study is underway with a larger group of subjects, looking at risk of stroke, breast cancer, colorectal cancer, and gout, in addition to the three diseases from the first study. A comparison of the two studies will be interesting.