Jurafsky Martin Chapter 01: Introduction
Metadata
Title: Introduction
Number: 1
The Introduction to the third edition is not yet published, these very brief notes are from the second edition, full notes will be added when the third edition introduction is published.
Core Ideas
The chapter starts by reviewing the usual set of NLP tasks: machine translation, question answering, dialogue systems etc.
It argues that what distinguishes the applications studied in the book is the knowledge of language required for complex language behaviour, including:
- Phonetics and Phonology — knowledge about linguistic sounds
- Morphology — knowledge of the meaningful components of words
- Syntax — knowledge of the structural relationships between words
- Semantics — knowledge of meaning
- Pragmatics — knowledge of the relationship of meaning to the goals and intentions of the speaker.
- Discourse — knowledge about linguistic units larger than a single utterance
Most tasks in language processing can be viewed as resolving ambiguity at one of these levels.
We say that some input is ambiguous if there are multiple alternative linguistic structures that can be built for it.
A good example shows all the varieties of ambiguity: “I made her duck.”
Here are five different meanings this sentence could have (see if you can think of some more), each of which exemplifies an ambiguity at some level:
- I cooked waterfowl for her.
- I cooked waterfowl belonging to her.
- I created the (plaster?) duck she owns.
- I caused her to quickly lower her head or body.
- I waved my magic wand and turned her into undifferentiated waterfowl.
These different meanings are caused by a number of ambiguities. First, the words duck and her are morphologically or syntactically ambiguous in their part-of-speech. Duck can be a verb or a noun, while her can be a dative pronoun or a possessive pronoun. Second, the word make is semantically ambiguous; it can mean create or cook. Finally, the verb make is syntactically ambiguous in a different way. Make can be transitive, that is, taking a single direct object (2), or it can be ditransitive, that is, taking two objects (5), meaning that the first object (her) got made into the second object (duck). Finally, make can take a direct object and a verb (4), meaning that the object (her) got caused to perform the verbal action (duck). Furthermore, in a spoken sentence, there is an even deeper kind of ambiguity; the first word could have been eye or the second word maid.
The book frames the models and algorithms presented as ways to resolve or disambiguate such ambiguities. For example deciding whether duck is a verb or a noun can be solved by part-of-speech tagging. Deciding whether make means “create” or “cook” can be solved by word sense disambiguation. Resolution of part-of-speech and word sense ambiguities are two important kinds of lexical disambiguation.
Models and Algorithms
The introduction then surveys the main models and algorithms used to tackle these disambiguation problems. It argues that one of the key insights in language processing in recent decades is that the kinds of knowledge required can be captured through the use of a small number of formal models that are drawn from the standard toolkits in computer science, maths, and linguistics.
Among the most important models are state machines, rule systems, logic, probabilistic models, and vector-space models.
These models lend themslves to a small number of key algorithms, such as state space search (eg dynamic programming), classifiers, and expectation-maximization algorithms.
Model | Examples | Uses and Knowledge Domain |
---|---|---|
State Machines | DFAs, NFAs, finites-state transducers | Phonology, morphology, syntax |
Declarative rule systems | Regular grammars, regular relations, CFGs, feature-augmented grammars | Phonology, morphology, syntax |
Logic Based | First order logic, predicate calculus, lambda calculus, semantic primitives | Semantics and pragmatics |
Probalisitic | Weighted automaton (FSM with prob. weightings), (hidden) Markov model (HMMs) | POS tagging, speech recognition, machine translation, Text to speech |
Vector Space | Linear Algebra Models | Information Retrieval, semantics |
Processing language using any of these models typically involves a Search Algorithm, searching a state space representing hypotheses about an input. EG in parsing, we search through a space of trees for the syntactic parse of an input sentence. For non-probabilistic tasks, eg state-machines, we use well-known graph algorithms like depth-first search. For probabilistic tasks we use heuristic variants like best-first search and A* search.
Machien learning tools are heavily used, such as classifiers, sequence models, decision trees, support vector machines, Gaussian Mixture Models, logistic regression, HMMs, and conditional random fields. Language processing uses many of the same methodological tools as machine learning, eg distinct training and test stes, and cross-validation.
The chapter then briefly reviews Eliza, the Turing Test, and some state of the art applications before presenting a brief history of the field.
The history is the basis for the lecture in the module in cm3060 Topic 01: Introduction so is not repeated here. There are more references in the book to papers to trace the history.
The main point of difference in the history presented here from the lecture is the inclusion in the book of a phase 1970-1983, under the heading Four Paradigms, which traces the emergence of stochastic, logic-based, natural languaage understanding, and discourse modeling paradigms.
The second edition also published before the ascent of deep learning, so misses that final phase which will be covered in the new edition.
The intro concludes by noting that understanding human psychology and language understanding can be helpful in this field. This isn’t a straight engineering discipline.