Separating Hyperplanes

Statistics in the age of HIPPA

Matthew Martin 11/13/2014 01:05:00 PM

Just a brief thought. Part of what limits how much healthcare improvement and research we can do is HIPPA, a law regulating how people's health information can be handled, in order to guarantee medical privacy. To do statistics, researchers need patient data, so that there is a sometimes arduous process of getting the appropriate qualifications, getting permissions, and maintaining data security.

On the other hand, do we really need data to do statistics? The standard model of statistical investigation in healthcare dates to a time when server resources were precious, internet was slow, and scripting languages were primitive. This is no longer the case. We could in fact do statistics through interfaces: all the patient data remains isolated on a secured server armed with a statistical scripting language, and on a client computer somewhere investigators could tell the server to run t-tests and regressions or what have you, and return only the estimates to the client.

I will leave you to speculate on the details here. The point is, in this system at no point is any health information, identifiable or otherwise, given to the researcher, so that there is no HIPPA concern.

A separate but related question is one of ethics. Human subjects research requires IRB approval and usually also informed consent, regardless of whether health information is involved, which is generally taken to mean that it is unethical (and illegal) for researchers to probe databases without first getting approval, even if the subjects have previously consented to allow that data to be collected for research purposes. How would this apply to our situation?

Again, the old standard presumed that researchers had to be in possession of data to analyze it, a factor that introduces new risks to the subjects as there are more opportunities for their data security to be breached. But in our new system, there is no new risk: the researchers are at no point given any data, so there is no potential for them to abuse the data nor is there any potential of an accidental security breach. Allowing additional studies to be performed using this data has no potential to harm subjects, beyond the risks they initially consented to when the data was collected under the original research protocol. Hence, there should be no need for further IRB approvals once the data is collected.

So how about it? Let's make remote analytics a thing. Is there a down side?

Anonymous 11/26/2014 10:04:00 AM

This has been thought of before, usually under the rubric "Statistical Databases," and the consensus is that such an interface is still subject to breaches of personal information by a determined attacker.

The Wikipedia page [http://en.m.wikipedia.org/wiki/Statistical_database] has a good summary with references.