Schedule

* = optional reading

Date	Topic	Reading	Assignments
15 Oct	(no class - introductory week)
18 Oct	(no class - introductory week)
22 Oct	Introduction
25 Oct	Crash course in Python and NLTK	NLTK, Install Python 3 and NLTK 3 and play around with it	A1
29 Oct	Generative statistical models; n-grams	Jurafsky & Martin, Chapters 4.1-4.2 ("N-grams") * Kevin Murphy, Binomial and multinomial distributions * Jurafsky & Martin (Smoothing), Chapters 4.5-4.7 * Manning & Schütze (Smoothing), Chapters 6.2-6.3
01 Nov	(holiday - All Saints)
05 Nov	Hidden Markov Models	Jurafsky & Martin, Chapters 5.1-5.5 (POS tagging), 6.1-6.4 (HMMs)	A1 due; A2
08 Nov	Training HMMs	Jurafsky & Martin, Chapters 5.7-5.8 (POS tagging evaluation), 6.5 (Forward-Backward) * Jason Eisner, HMM-in-a-spreadsheet
12 Nov	Discussion of Assignment 1
15 Nov	Context-free grammars	Jurafsky & Martin, Chapters 12.1-12.3, 12.5, 13.1-13.3 * Stuart Shieber, Yves Schabes, and Fernando Pereira (1993), Principles and implementation of deductive parsing * Shieber (1985), Evidence against the context-freeness of natural language
19 Nov	Complexity of algorithms and the CKY parser	Jurafsky & Martin, Chapter 12.4 Khan Academy, Asymptotic notation	A2 due; A3
22 Nov	Probabilistic CFGs	Jurafsky & Martin, Chapter 14.1-14.5, 14.7
26 Nov	Discussion of Assignment 2
29 Nov	Training PCFGs	Jurafsky & Martin, Chapter 14.3 Manning & Schütze, Chapter 11.3-11.4 Michael Collins, Notes on the inside-outside algorithm * Glenn Carroll & Eugene Charniak (1992), Two experiments on learning probabilistic dependency grammars from corpora * Fernando Pereira & Yves Schabes (1992), Inside-outside reestimation from partially bracketed corpora
03 Dec	(Kathrin Passig talk)
06 Dec	More accurate PCFG parsing	Mark Johnson (1998), PCFG Models of Linguistic Tree Representations (esp. on parent annotations) Michael Collins, Lexicalized PCFGs Dan Klein & Chris Manning (2003), Accurate unlexicalized parsing * Michael Collins (1997), Three Generative, Lexicalized Models for Statistical Parsing * David Magerman (1995), Description of head percolation tables * Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein (2006), Learning accurate, compact, and interpretable tree annotation William Morgan, Statistical Hypothesis Tests for NLP (explains approximate randomization) Berg-Kirkpatrick et al. (2012), An empirical investigation of statistical significance for NLP (explains bootstrap testing)	A3 due; A4
10 Dec	Advanced PCFG parsing algorithms	Joshua Goodman (1999), Semiring parsing (not Section 4, 6; ignore Earley parser if you like) Charniak et al. (2006), Multilevel coarse-to-fine PCFG parsing Dan Klein & Chris Manning (2003), A* Parsing - Fast exact Viterbi parse selection * Stuart Shieber, Yves Schabes, and Fernando Pereira (1993), Principles and implementation of deductive parsing * Eugene Charniak, Sharon Goldwater, and Mark Johnson (1998), Edge-based best-first chart parsing * Liang Huang and David Chiang (2005), Better k-best parsing * Michael Collins and Terry Koo (2005), Discriminative reranking for natural language parsing * Eugene Charniak and Mark Johnson (2005), Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
13 Dec	Discussion of Assignment 3
17 Dec	Dependency parsing	Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic (2005), Non-projective Dependency Parsing using Spanning Tree Algorithms Joakim Nivre (2008), Algorithms for Deterministic Incremental Dependency Parsing * Joakim Nivre, Ryan McDonald (2007), Characterizing the Errors of Data-Driven Dependency Parsing Models * Jason Eisner (1996), Three New Probabilistic Models for Dependency Parsing -- An Exploration * Eliyahu Kiperwasser and Yoav Goldberg (2016), Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations * Google, SyntaxNet
20 Dec	Statistical machine translation - Alignments	J & M, Chapter 25.1-25.6 Lopez, Post, Callison-Burch, JHU 600.468 Machine Translation - Homework 1 Adam Lopez, Word Alignment and the Expectation-Maximization Algorithm * Michael Collins, IBM Models 1 and 2 * Dyer et al., fast_align * Liang et al., Berkeley aligner * Gao, MGIZA	A4 due; A5
24 Dec	(Christmas break)
27 Dec	(Christmas break)
31 Dec	(Christmas break)
03 Jan	(Christmas break)
07 Jan	Statistical machine translation - Translation	J & M, rest of Chapter 25 David Chiang (2007), Hierarchical phrase-based translation
10 Jan	Discussion of Assignment 4
14 Jan	More expressive grammar formalisms	Joshi & Schabes (1997), Tree-adjoining grammars Steedman & Baldridge (2011), Combinatory categorial grammar * Shieber (1985), Evidence against the context-freeness of natural language * Vijay-Shanker and Weir (1994), The equivalence of four extensions of context-free grammar * Kuhlmann et al. (2015), Lexicalization and generative power in CCG * Lewis et al. (2016), LSTM CCG Parsing
17 Jan	Bayesian methods - LDA	David Blei (2011), Introduction to probabilistic topic models Thomas Griffiths & Mark Steyvers (2004), Finding scientific topics Mark Steyvers & Thomas Griffiths (2007), Probabilistic topic models (Sections 1-4) * William M. Darling (2011), A theoretical and practical implementation tutorial on topic modeling and Gibbs sampling * Bob Carpenter (2010), Integrating out multinomial parameters in LDA and Naive Bayes for Collapsed Gibbs Sampling * Allison Chaney, Visualizing topic models on Wikipedia	A5 due; A6
21 Jan	Bayesian methods - Grammar induction	Kevin Knight (2009), Bayesian Inference with Tears Trevor Cohn, Phil Blunsom, and Sharon Goldwater (2010), Inducing tree-substitution grammars (up to Section 4.1; have a quick look at Sections 5-6) * Hiroyuki Shindo, Yusuke Miyao, Akinori Fujino, and Masaaki Nagata (2012), Bayesian symbol-refined tree substitution grammars for syntactic parsing
24 Jan	Discussion of Assignment 5
28 Jan	Semantic parsing	Luke Zettlemoyer and Michael Collins (2005), Learning to Map Sentences to Logical Form. Structured Classification with Probabilistic Categorial Grammars Yuk Wah Wong and Raymond J. Mooney (2006), Learning for Semantic Parsing with Statistical Machine Translation * Flanigan et al. (2014), A Discriminative Graph-Based Parser for the Abstract Meaning Representation * Groschwitz et al. (2018), AMR Dependency Parsing with a Typed Semantic Algebra
31 Jan	Lexical semantics	Jurafsky & Martin, Chapters 19.1-19.3, 20.6-20.8 * Mikolov et al. (2013), Efficient estimation of word representations in vector space * Jeff Mitchell & Mirella Lapata (2008), Vector-based models of semantic composition * Marco Baroni & Roberto Zamparelli (2010), Nouns are vectors, adjectives are matrices * Yehoshua Bar-Hillel (1960), A demonstration of the nonfeasibility of fully automatic high quality translation * Fellbaum et al., Wordnet website * Peters et al. (2018), Deep contextualized word representations * Devlin et al. (2019), BERT: Pre-training of deep bidirectional transformers for language understanding	A6 due
04 Feb	Discussion of Assignment 6
07 Feb	Discussion of final projects