Monday, April 13, 2009

Spell Check for POS Tagging

Language is inherently hierarchical. Letters make words; words make phrases; phrases make sentences; and, so on. NL processing tends to take place in stages corresponding to the levels of that hierarchy. It's a good paradigm; you can get good results by plugging state-of-the-art modules together, squeezing semantics out of natural language input.

It's also the case that NL processing is inherently error-ridden. Every step in the process can stumble on trip on the rampant ambiguities in language. Notice that I did not restrict that comment to discussing machine NL processing... people make mistakes, too. We say "um", we introduce sounds we didn't mean to, misspell words, mangle sentences and ideas midway. And we mishear, misread, get caught walking down (usually unintended) garden paths. Many mechanical text-processing systems are feed-forward, making errors at each step and then accumulating the errors into a growing number until by the end, perhaps 25% to 75% of the sentences are misunderstood.

People, however, often recover from misunderstandings, re-reading the text, rejumbling one's thoughts until a correct parsing is found. The key is to use realizations (of error) at one level in processing to revise the previous level's work, which was flawed. Once an error is noted, the key is to determine where it was.

This weekend, I ran a POS-tagger on the output of an unrelated phrase chunker. I noticed such patterns as (and these are some of the bad ones only):

NP: DT VB
VP: JJ IN

Now obviously what had happened was that POS-tagging errors in the second tagger took place in processing phrases that resulted when the other system's chunker (and thus, it's POS-tagger) had categorized the POS correctly. "JJ IN" was a case where the real underlying form was "VB RP", and the first tagger got it right (allowing the chunker to get it right) but the second tagger got it wrong.

The key takeaway here is not that the first tagger is better! That may be true, or may not be true. The key observation, rather, is that when a POS tagger makes an error (and they all do), the prospect of chunking the sentence correctly is thereby doomed on that sentence. GIGO = "Garbage in, garbage out."

But consider the opportunity to recover. The fact of the matter is, "JJ IN" is a red flag that the tagger may have screwed up and that before committing to a chunking of those tokens, the system may want to reconsider the probabilities of those particular tags and see if a more agreeable chunking can result from different tagging.

This is the heart of the top-down/bottom-up manner of processing which is known to be powerful in pattern recognition. Interactive Activation in particular is a demonstration of just how right this kind of processing is.

This amounts to incorporating context in the tagging of a particular word, which is something that any competent tagger already does. (Hidden Markov, by looking at the context to the left of a token; a transformation-based tagger by rules with conditions that look at neighboring tokens.) But what people do, and NL systems need to do, is to revisit the processing done by one module when the next module detects anomalies. A simple corollary to GIGO... when you find garbage coming out, you know that garbage came in. Find it and fix it, and you've got a better, more robust system.