Monday, February 25, 2008

I Fought the Law...

Historically, many of the major advances in science have involved the coining of a "Law". A good law is pithy, describes the world in a way that enables applied use, and tells people who are strong in mathematics, something about the nature of the world that would make the law fit the equation. For example, the inverse square law governing the apparent brightness of a star follows neatly from the fact that the discrete packets of light streaming outward from it spread out through successively larger spheres. Einstein's famous E=mc2 tells you that there is a universal speed limit equal to lightspeed.

Stepping back from the methods and techniques that have been used in NLP, we can hope for a law that describes the progress that has been made as research from industry and academia pursue better solutions to the hard problems. If you listen to Ray Kurzweil, you'd conclude that progress is going to approach a vertical asymptote. In other words, things are going to improve so much that the future will be off the charts -- things, sooner or later, are going to become infinitely good. Maybe quite soon. If you think that's true, you should invest heavily in AI-style technologies now!

The history of NLP, however, as well as that of some other fields of AI, has produced repeated evidence for a different law governing progress. A lot of the cycle of "hype, then winter" comes from people failing to recognize this law. The law that the history of NLP describes is quite different from Kurzweil's rosy-colored vision. In fact, it's at right angles to it. Rather than the state of the art approaching a vertical asymptote (infinite goodness soon!), it has been approaching a horizontal asymptote (progress is all done!).

Obviously, these two worldviews could not be more opposed. But the facts clearly point to which one is valid. Let's take some examples.

An important task in NLP is called Part of Speech Tagging. It consists of marking all of the words in a piece of text with their grammatical category. Not many people want to do this for its own sake, but it's a highly useful initial step in analyzing text more deeply. The first effort to do this electronically was the work on the Brown Corpus by Greene and Rubin in 1971. They achieved, with a very simple approach, performance of 70%. Not too bad for a first try. By the early 1980s, researchers had pushed this number way up, with a system called CLAWS achieving about 94%, meaning that we'd done away with about four fifths of the errors made by Greene and Rubin's approach. Big progress! But in the 25 years since then, progress has been extremely minor, perhaps to 96%. In fact, there is good reason to believe that tagging better than 97% is impossible, because even human annotators tagging a text do not agree more than that often. Moreover, some very different approaches to tagging all yield similar accuracy rates, just shy of that theoretical maximum. When much more progress happens in the first decade than in the succeeding quarter century, you have found yourself a horizontal asymptote.

As it happens, 96% is pretty good, and a tagger that performs that well is highly useful, errors be damned. And of course, no one would argue that a horizontal asymptote for accuracy must exist for anything, because you can't beat 100%. So in principle, this isn't a problem.

But in practice, it is! Because while Part of Speech Tagging is ready for application, some other very important tasks in NLP are not. Suppose the asymptote were not at 96%, but much lower. That's what you have with another highly interesting problem called Word Sense Disambiguation -- the determination of which meaning of a word a writer had in mind. (For example, "bank" like the side of a river, or "bank" like a financial institution.) Here, though the numbers vary considerably according to how you measure accuracy, it is clear that there are no excellent methods available. WSD has not advanced dramatically in the half century since it became an area of interest, and the approaches that exist are tailor-made to disappoint the people who understandably would like a good solution to this problem.

A very big problem for NLP as an enterprise is the cycle of hype, disappointment, and mistrust that I identified a few weeks ago. And a very big part of the hype end of the problem is that people who are the recipients of hype (customer bases, venture capitalists, managers, etc.) don't know which law has been in effect. Innovation does exist. Progress does happen. But before an idea becomes a software project, anyone reviewing it should determine if the NLP needed to make the approach successful has already hit a horizontal asymptote.

No comments: