A blogoland retreat for undercover linguists

Wednesday, December 06, 2006

Chomsky said it was highly improbable...

While reading a paper by MacDonald et al. 1994 called Lexical Nature of Syntactic Ambiguity Resultion on whether human syntactic ambiguity resolution has a lexical component, I was distracted by a quote from Chomsky that still echos in the halls of many a Linguistic Department. There seems to be epistemological divide in Linguistics falling nearly on empiricist/rationalist lines. So first I think it's interesting, at least historically, that this Chomsky quote (from Syntactic Structure) seems to sum up the divide between the two camps:
I think that we are forced to conclude that grammar is autonomous and independent of meaning, and that probabilistic models give no particular insight into some of the basic problems of syntactic structure. (P.17)
On one side of the hall are rationalists, or formal syntactitions, trying to describe the design of our brain's syntax module, driven by axioms, checked only issue-specific data, and only a few issues at a time (never the whole grammar all at once, checked against huge amounts of real language data). Then on the other side of the hall, we have computational linguists who frequently put to use actual words and phrases to help determine what is possible (or likely) syntactic structure.

What echos most resonantly is the notion that the probability of certain words and phrases are not used to inform grammaticality judgement in formal syntax. To back this up, Chomsky provides perhaps the most famous linguistics example in the universe, often used to demonstrate the infinite productivity of language; the sentence "Colorless green ideas sleep furiously." In its original context, it was used to show that while statistically unlikely, a sentence can still be perfectly grammatical. So we shouldn't be tempted to base our notion of grammaticality on probability alone.

But the other reason why I'm intrigued by this Chomsky quote is that probability does seem to play an important role in determining syntactic structure, at least the one we pick when we have more than one to choose from. And that takes us back to MacDonald, who presents variants of the following examples, which first appeared in work by Ferreira and Clifton, 1986:

(1) The expert examined by the lawyer.
(2) The evidence examined by the lawyer.

The temptation in (1) is to make a bad parse at first, taking 'examined' as the main verb. Then you reach the preposition 'by' and you have to go back. Or maybe you reach at the end of the sentence and then go back. In any case, you go back, and reparse the sentence as a sligtly more complicated one, with 'examined' as part of a relative clause. So, if you were tempted tot make a bad parse (and statistics on the subject show: you probably were) your mistep can be modeled by a simple heuristic: "When you come upon the next word, add the minimal syntactic structure necessary to accomodate it."

This heuristic is used to explain garden path sentences, of which (1) is an example. But not all garden path sentences are created equal. If we look at (2), it's likely that we won't be as tempted to walk down the path because 'evidence' is not an agent, and verbs like 'examined' prefer agents as their subjects. Studies like Trueswell et al. 1994, validate that we don't first make a bad parse in sentences like (2) where we have a helpful context, but we do in (1) where the context doesn't help.

So we see that lexical context does exert observable influence on how we parse sentences, and how we choose the right parse. And computational linguists have acknowledged the effect of lexical context as well, which has improved their parses. I'm pretty sure formal syntactitions dismiss this sort of thing as being outside their modeling domain, but why do they get to do that? I mean, shouldn't getting to the right parse, the parse that humans arrive at naturally and most frequently, be elemental to the problem of modeling syntax? I don't know. It's just a thought.


pat said...

Interesting post.

There's a great paper that addresses Chomsky head-on on probability:

Statistical Methods and Linguistics - Abney (ResearchIndex)

It's also available in this book edited by Philip Resnik and Judith Klavans: The Balancing Act - The MIT Press. Good stuff...

Person Shaped said...

Thanks for the link.

I think Abney does really calls attention to just how narrow a space formal syntactitions have carved out for themselves with all of the "Oh, that's actually performance." evasion.

And as a side note, it's great when you read a paper that directly addresses a nagging concern, especially if it's not one you're actively pursuing in hopes of making an original contribution.

Jonathan said...

Jackendoffian brands of syntax have always tried to account for things like that, and they haven't always been so un-Chomskian-like.

Anonymous said...

I don't think it's fair to say that formal syntacticians dismiss garden path phenomena as being outside their modeling domain. Most studies of garden path sentences have used formal principles (e.g. minimal attachment) to explain parsing preferences. However, no-one has really found a way of making the garden path data have strong implications for possible syntactic analyses, which is why formal syntacticians don't pay much attention to these data when they're theorizing.

I don't see anything in the Abney paper that formal syntacticians would really disagree with, apart from a rather half-hearted attack on the performance/competence distinction, which is undermined by the paper's own observation that grammaticality is only one of many factors entering into acceptability judgments. Most syntacticians are fairly neutral on the question of how the parser resolves ambiguities.

Thanks so much for this! This is exactly what I was looking for

