Reflecting back on the data breach the other day where information on 191 million American voters was accessible to anyone on the Internet, the real problem wasn't so much that modern databases are so easy to misconfigure and have default access-all settings. It wasn't the fact that a network scan over the mongo port will yield countless other goodies. It wasn't that whoever leaked this would likely remain undiscovered.
The high bit was a little factoid on just how cheap our private information is: the cost of the leaked database is estimated at just $300,000.
Dividing that number by the 191M data records, the price of your name, address, personal information, and political affiliation turns out to be a measly $0.00157. As usual, I'm not using Verizon math; that's less than one fifth of a cent.
It usually takes me weeks, sometimes months, to figure out someone's political affiliation, and here it is, being sold for a fraction of a penny, and in fact, also freely available on the web, thanks to a piece of software that is "web-scale."
And it's incredibly easy to compile comprehensive databases of this kind. Our laws were written for a time and place where giant data collections and intersections were difficult to perform, so we've erred on the side of forcing the government to release whatever it knows. Voter records, driving records, court records and many more can be collected relatively easily. True, it is not quite trivial: government agencies are either mired in bureaucracy and genuinely incur high fees, or want to protect the incumbent middlemen by charging high fees.
But it turns out that any country can bootstrap its intelligence operation by shelling out a measly sum of $0.002 per person, and acquire critical data on every single registered American voter.
And businesses can quite easily collate similar databases just by storing and selling your information. Checking in to a hotel? Using a phone number to sign up for a loyalty card? Browsing while logged in to Facebook? Posting a comment online? Every interaction is a leak -- if you were a computer program, and we were running a strict information flow control system, there is no way you could function in modern society.
I have no hope that we will tighten privacy laws any time soon. Both government and business has strong incentives in place to erode privacy, while there are few representatives of citizen interests (EFF is here).
But perhaps there is some good that can come out of these data breaches: they reduce the value of the said data. After all, the data changes far too slowly, and the breaches are occurring constantly. People move every 7 years, they change voter affiliation approximately never, their hobbies shift every five years, if that, and their habits and opinions don't seem to change much. Sure, I can suddenly become interested in buying a car today as my old one fails to pass inspection, and Google will be the first to know, but frankly, even that was a predictable event given that I was driving a we-totally-weren't-cheating-on-emissions-nor-would-we-ever-install-piston-seals-upside-down VW, and someone should have been flashing brand ads at me over the last few years just based on my demographic.
So, the rate of change in human-related data is multiple orders of magnitude lower than the rate at which it leaks. People are slow, and while mongo is also slow, it can leak data far faster than that data can go out of date, if the data is about humans.
This has a clear implication: in the limit, everyone will have access to all the data related to everyone who is alive during their lifetime on earth. There will be value in timely data, and companies will vie to control the latest news on a person, but the basic, slow-changing facts about everyone will be available to all.
At some level, that's a depressing thought -- that you could know pretty much everything there is to know about everyone.
But this process will drive the price of personal data towards a big fat $0. We will reach a point where this kind of data can be had for free by anyone. At the moment, only intelligence agencies and top tier marketers have access to such data. But at some point, the kind of people who run late night ads on TV will have access to the same information. They won't have to funnel through the well-defined search engine ad interface to reach out to their demographic, though the lazy ones will probably just stick to ads. But the enterprising ones will know everything there is to know about the people they want to convert. Imagine a world where every kooky cult has access to the type of information currently available to people with $300K to burn. Instead of a clueless but well-dressed person knocking at the door, we'll have social manipulators who try to convert grandma to their cause, who already know who she is, what she likes, her stance on major issues, as well as the opinions she shared online. And imagine the damage that targeted, niche hate groups can do with access to this kind of information.
Perhaps that will help create some legislation about data privacy, but by then it will be too late.
Jan 4, 2016: The data leak has been linked to Bill Dallas's pro-conservative, Christian group called United In Purpose.