So, Yahoo is in the news for buying Summly. Summly is a company that extracts summaries of natural language text, a TL;DR of sorts.
Summly licensed its core technology from SRI, which, previously, spun out Siri and sold it to Apple. Summly had 5 engineers, only 2 of whom will be moving to Yahoo. Summly is reported to have 1M downloads of their app in mobile app stores.
I want to take a few minutes to process this, because it points to some trends that should cause any technologist to raise an eyebrow.
Let's leave aside three unrelated factoids, namely, the Summly founder is only 17 years old, the company was in existence for only 18 months and Yahoo reportedly paid $30M for it. Why should we put an artificial barrier around these topics?
First, it really doesn't matter if the founder is a high-school student. If he had a bright idea, he deserves the spoils. Big Media, which has perfected the art of trolling the masses, is making sure to include a picture of the youthful founder in every story already, and we need to rise above that.
Second, dividing someone else's spoils by amount of time they spent earning them leads us to question all kinds of things. We know where that leads: to more progressive taxation, class warfare, and the end of the world as the uber-rich know it. Our current zeitgeist cannot broach these topics, so they're off the table.
Finally, it doesn't matter how much money Summly got or whether Yahoo wasted its money. By definition, Yahoo's directors know how to spend Yahoo's cash reserves best. In any case, company valuations are far out of the expertise of most people. So let's leave this boring topic alone. The directors have clearly said that this is the best thing Yahoo can do with $30M at this moment in time, which we have to take at face value.
And if we don't take these factoids off the table, others may think that our analysis is tainted. Do keep in mind that every academic of my generation stayed in graduate school through at least one dotcom boom. I interviewed at DE Shaw when Bezos was there and turned them down for 1/10th of their salary offer. The field is lucrative enough that we've all done very well anyway.
I want to first focus on this from a technologist's perspective, and there is only one germane fact: the company developed little Natural Language Processing technology of its own.
They licensed the core engine from another company. They are the quintessential bolt-on engineers, taking a Japanese bike engine, slapping together a badly constructed frame aligned solely by eyeballs, and laying down a marketing blitz. That's why the story sells. "You, too, can do it." But do you want to?
In some sense, everyone is a bolt-on engineer. Nobody rolls out their own fab and builds up from raw silicon; we all reuse some component or another, even if it's a language runtime, a web framework like Django or Rails, a protocol like Paxos, a fast database, or a library, say for numerical analysis or even natural language processing. Even CS theoreticians are, in a sense, reusing techniques from math. Everybody stands on the shoulders of the giants that came before them, and all that.
But it's critical to keep tabs on the ratio known as "glue versus thought." Sure, both imply progress and both are necessary. But the former is eminently mundane, replaceable, and outsource-able. The latter is typically what gives a company its edge, what is generally regarded as a competitive advantage.
So, what is Yahoo signaling to the world? "We value glue more than thought."
If Summly is an innovative company worth purchasing, I have some news for Yahoo: my AI colleagues have tricks up their sleeves that will blow your minds!
Let's get some perspective here: Summly wasn't reading Ulysses by James Joyce and extracting the fact that the three-masted ship Leopold Bloom sees on the horizon is a metaphor for the Holy Trinity and therefore represents the Catholic Church. It wasn't reading a 12 page article in Harper's and extracting the cleverest puns and pop culture send-offs lovingly embedded by a writer who is good at his craft and earning below his potential. And it wasn't taking my blog posts and somehow conveying the nuanced ennui I harbor for bolt-on engineering.
It was summarizing news. Articles that are already written with a TL;DR in the first paragraph.
For 95% of the news I read, that can be done with a regexp that slices out the first sentence. Very rarely, the first paragraph contains what journalists call a "hook," and the infamous 5-W's are embedded in the second paragraph. So if it worked perfectly, Summly would eliminate one extra sentence 5% of the time.
And if Yahoo were to look at the work of anyone who is active in NLP (e.g. Claire Cardie, Lillian Lee), it'd immediately discover that this is a deep field full of exciting developments at its core. Gluing an NLP engine up to news surely adds some value, but pales in comparison to what cutting edge NLP algorithms can accomplish.
So if Yahoo is to be a technology company, it needs to do core technology acquisitions that give it a competitive advantage. Glue is not that kind of advantage.
From a societal perspective, we seem to be in the midst of a TL;DR wave. I wrote about this before when my post on MongoDB's fault tolerance was getting really dumb responses from people who seemed to have difficulty reading:
Look, I realize that we live in a TL;DR culture. I lived through 8 years of a non-reading president along with everyone else. I know that the brogrammers out there are constantly getting texts from their buddies to plan the weekend's broactivities, trying to decide in whose mancave they'll be setting up their lan party, and are thoroughly distracted in between futzing with their smart phones and writing a few lines of code per day by cutting and pasting it from stackoverflow. But it's really not ok to act functionally illiterate when you're not actually illiterate, when an advanced society that once put a man on the moon worked so hard to educate you.
Summly's entire business model seems to revolve around catering to this demographic. Frankly, it pains me.
Our time is valuable, and we definitely need tools that help. And because digital information sources have become repetitive echo chambers, I would welcome tools that can extract the latent signals. "This article is really a fluff piece paid for by tobacco interests." "This picture of attractive happy people of different races mixed together in the same proportion as society at large, sipping lattes at Starbucks, is probably an image ad by Starbucks." "Yahoo makes outrageous purchase to get people to talk about its dying brand, and perhaps to indicate that it has cash to waste, the same way a wildebeest chased by lions paradoxically jumps into the air to telegraph 'I am so healthy and have so much energy that I can afford to waste some of it; you're after the wrong prey.'"
That is, analysis would be useful. And analysis requires content that is not syntactically in the message itself. What's wrong with the world is not that we do not have time to read, but that reading is so frustrating. On many topics outside our expertise area, we lack the extra information to extract informed opinions -- we lack the capacity and context to judge. Anyone can read the topic sentences of paragraphs to extract a summary (a "tl;dr") from any piece of writing of any length. The action in this space is to get at the hidden message that lies behind the words.
That's why TL;DRs are often just as worthless as the text they summarize. They mark pieces of badly structured thoughts that look like a wall of text and read like one. The action in AI is happening elsewhere, tackling the deeper problems such as sentiment analysis.
Finally, the entire Summly discussion seems to be missing a critical element: there seem to be no happy users of Summly. A search for "I use Summly" reveals 6 hits, only two of which seem unique. For 1M downloads, that's a strange outcome.
But it's not unexpected. For only a robot would fall in love with a product that robotically extracts TL;DRs.
I have this hunch that every news announcement can be expressed as a combination of basic universal sentiments (yes, there is some irony discussing this here), the way an FFT breaks down a complex signal into sine waves.
So, some might disagree on the coefficients, perhaps, but it's clear that the following sentiments have non-zero terms:
"bolt-on engineering is considered innovative in our company,"
"our demographic is of the TL;DR variety,"
"we're not caught up to the cutting edge in NLP, and we're buying our way into it not from the top-down, starting with intellectual leaders, but from the bottom-up, starting with iPhone applications."