Metadata is in the eye of the beholder

The intelligence community has been harping on the word "metadata" to try to underscore that the information they collected is not quite "data", is not subject to the same limits, and is not quite as bad. I want to put an end to this charade, by way of an analogy.

An Analogy

Card catalog

Suppose I want to understand everything there is to know about the lightning bugs in my back yard. They are pesky, ugly little insects, and their numbers fluctuate wildly. I want to understand what they are doing and what makes them tick so I can eradicate them.

If I were a molecular biologist, my starting point would be that the data I want is a bug's DNA. Surely, the DNA describes the behavior of an individual completely, so it makes sense that I would want to acquire it and figure out everything a single sequenced individual will do. If the bug DNA is "data," then all other information about how many bugs there are in my backyard, their gender, their swarm sizes, their flashing frequencies and so forth would simply be "metadata." It's nice to know that stuff, a molecular biologist would say, but clearly it's not as critical as the DNA sequence, and therefore shouldn't be subject to the same care and attention as DNA samples.

But if I were an old-school entomologist, like my real-life neighbor who is a retired professor, I would claim that the DNA data is really of no consequence. Humankind had thousands of years of bug eradication long before DNA was discovered let alone sequenced, and can do so again. All an entomologist would need is time-series data on population counts, gender, swarm sizes, flashing frequencies and so forth -- what was metadata for the molecular biologist is data for an entomologist. He would use this data to figure out and manipulate their behavioral patterns so as to drive down their numbers. To an entomologist, the "metadata" would involve the circumstances around his real data of interest. The location of my backyard, the types of other plant and insect species on premises, the weather patterns and so forth would constitute metadata for this individual.

And if I were a dyed-in-the-wool ecologist who had to suffer through grad school along with a lot of traditional bio-geeks who've never encountered an animal too disgusting to dissect, I would say that all this attention is being wasted on individual organisms. Give me a species interaction graph and a census of all the species in the yard so we can figure out how the bug population will behave over time. We'd introduce competing species and drive down the numbers. The entomologists' metadata would be my data.

Meta Depends on Your Point Of View

Card catalog

One could go on. Clearly, what constitutes data versus metadata is determined not by any intrinsic property of the data itself, but by the questions that that data is meant to answer.

Let's examine what it is that the intelligence community wants to do with phone call records and online activity logs to see if it fits any kind of meta designation.

The contents of phone conversations are clearly important. If our goal is to stop an immediate attack, a voice that says "attack at dawn" is what we want to catch. And this is the imaginary scenario that the intelligence community will play up. But if our goal is to investigate a network, to find out who is related to whom by what degree, and what their usual communication activities are, then the call log "metadata" is very much the actual data we seek. It is not one-step removed; it is the very thing and the only thing we want. If we're doing anomaly detection or community discovery or determining some kind of a simplistic color-coded terror alert level, we'd be able to do our analyses solely with metadata.

The "meta" designation is really an attempt to denigrate the value of the data at stake, to insinuate that this data is one step removed from that which we want, and to subtly insist that it should therefore be subject to less scrutiny.

Meta is Often More Valuable

Yet metadata is often far more valuable than so-called data itself.

Take, for instance, the NSA's current predicament following Snowden's leaks. What Snowden leaked was information about the information that the NSA collected. Since NSA calls the latter "metadata," this makes Snowden's leaks meta-metadata. I don't need to belabor how damaging the leak was for the NSA, even though it's supposedly twice removed from "data."

And going further, here's the NSA's response to a FOIA request, explaining why revealing the presence or absence of some metadata (which would be metametadata) would cause grave harm to the United States, because it would reveal information about the capabilities of the NSA. We're veering off to cubic-meta territory here.

Card catalog

There have been narrow legalistic arguments between legal scholars about the privacy guarantees over call records. While it's futile to try to keep lawyers from discussing arcane legalistic definitions, these discussions all miss the point. Simply put, the public finds it creepy for the government to track their lives, their interactions and their overall behavior at that scale and in that fashion. Jane Average can turn a blind eye towards evil, unwarranted or even illegal activities on occasion, especially if they take place overseas, but a domestic creeper is a hard sell to families.

So the intelligence community, which never met-a-data that it didn't want to collect, should drop the whole metadata charade. The discussion should not be about legalistic definitions. It should be whether or not collecting this particular information, for the particular purpose of massively cross-linking and analyzing it, at this massive scale, is at odds with our values.


comments powered by Disqus