Alex's Notes

cm3060 Topic 06: Syntax and Parsing

Main Info

Description

In this topic we examine some of the linguistic principles that shape human language and learn how to apply different syntactic approaches and formalisms.

Assigned Reading

Supplementary Lectures

Lecture Summaries

Introduction to Syntax and Parsing

Syntax is the way in which words are arranged. At the heart of the idea of providing structure to language at the syntactic level is the notion of constituency, which refers to the idea that groups of words behave as a single unit. A grammar is an inventory of all the constituents of a language.

Walks through the discussion of noun phrases in Jurafsky Martin Chapter 12: Constituency Grammars - they can be substituted in similar syntactic environments.

Follows from there to phrase-structure grammars, or Context Free Grammars. Consists of a set of rules and a lexicon of words or symbols. Gives an example of a CFG.

We can use CFGs to generate or assign structure. We’d commonly represent the structure as a tree, a parse tree.

Every grammar has a start symbol (in natural languages typically a sentence. A language is the set of strings derivable from the start symbol.

Strings that can be derived from the grammar are grammatical, others are ungrammatical.

Mapping from a string of words to a parse tree is syntactic parsing.

There are different sentence-level constructions

  • declarative, “the book is on the table”

  • imperiative, “put the book on the table”

  • yes-no, “is the book on the table?”

  • wh-structures, “why is that book on the table?”

There are other grammatical models. CFGs focus on how words combine to form constituents. Dependency grammar focuses on how words relate to each other.

Dependencies are binary asymmetric relationships.

It’s a labelled directed graph, between a head and its dependents.

Head is usually a tensed verb.

Arcs are dependency relations (heads->dependents).

Graph is called projective if there are no crossings. Crossings rarer in English than other languages. In others word order can be more fluid, depend on eg word endings for semantics.

Grammars in Practice

First lecture is on constituency grammars, defining them and using them to parse in nltk.

Create the Grammar with eg nltk.CFG.fromstring. Then you can instantiate a parser with nltk.ChartParser(grammar) or nltk.RecursiveDescentParser(grammar) then use parser.parse(sent), where sent is a list of tokens.

Shows recursion in CFGs through production rules that form cycles.

Second lecture shows instantiating a dependency grammar in nltk and spacy.

Works through creation with nltk.DependencyGrammar.fromstring, which notably just seems to be production dependencies from terminal to terminal. We can parse with ProjectiveDependencyParser

Runs through it with spacy

import spacy
from spacy import displacy
sent = "I shot an elephant in my pyjamas"
nlp = spacy.load('en_core_web_sm')
doc = nlp(sent)

for token in doc:
    print(token.text, "\t", token.dep_, "\t", spacy.explain(token.dep_))

displacy.serve(doc, style="dep")

Doesn’t show visualizing the tree, which you can easily do in spacy with displacy above.

Probabilistic Parsing

Talks about fish.

How do we express probability in parsing? Each rule in a CFG can be assigned a probability. So every production rule with the same non-terminal on the left will have a probability, the sum of which will be 1.

Note that like elsewhere in NLP the probabilities end up very small. But they allow you to compare the likelihood of different productions.

Lab Summaries

A lab in week 11 has you extend a probabilistic grammar.