NLTK defines a lot of classes that are useful for working with natural language. Here we look at how to work with context-free grammars in NLTK.
import nltk
groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I'
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas'
V -> 'shot'
P -> 'in'
""")
groucho_grammar
type(groucho_grammar)
groucho_grammar.start()
groucho_grammar.productions()
from nltk.grammar import *
groucho_grammar.productions(lhs=Nonterminal("NP"))
groucho_grammar.productions(rhs=Nonterminal("Det"))
pp = groucho_grammar.productions(rhs=Nonterminal("Det"))
pp[0]
pp[0].lhs()
pp[0].rhs()
NLTK comes with pre-implemented parsers for CFGs. Parsing a sentence with a CFG returns a list of parse trees. We can either look at their string representations or have the trees drawn graphically.
sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']
parser = nltk.ChartParser(groucho_grammar)
trees = list(parser.parse(sent))
print(trees[0])
type(trees[0])
trees[0]
trees[1]