CSE 562/662 - Natural Language Processing

Instructor: Brian Roark

Class time: Tu/Th   4:00 - 5:30 PM    Sep. 30 - Dec. 9, 2008

Class location: Wilson Clark Center - Room 403,
videoconf'd to OHSU's Marquam Hill Campus, Biomedical Research Building (BRB), Room 403

Office hours: Th 10-12, Central 115, or by appointment

Required textbook: None

Optional (suggested) textbook: Roark and Sproat, Computational Approaches to Morphology and Syntax

Also useful supplementary textbook: Jurafsky and Martin, Speech and Language Processing

Skip to overview of lectures.

Goals

The goal of this course is to give a broad but detailed introduction to the key algorithms and modeling techniques used for Natural Language Processing (NLP) today. With a few exceptions, NLP involves taking a sequence of words as input (e.g. a sentence) and returning some annotation(s) for that string. Well-known examples of this include part-of-speech tagging and syntactic parsing. Many other common tasks, e.g. shallow parsing or named-entity recognition, can be easily recast as tagging tasks; hence certain basic techniques can be widely applied within NLP. Applications such as automatic speech recogntion, machine translation, information extraction, and question answering all make use of NLP techniques. By the end of this course, you should understand how to approach common natural language problems arising in these and other applications.

Prerequisites

There is no official programming language for this course, but there will be a fair amount of programming required to complete assignments, hence facility with some programming language is assumed.

Grading

10% of your grade will depend on in-class discussion, 50% on the homeworks and 20% each on the midterm and final.

What we'll cover and an approximate schedule

Roughly speaking, half of the course will be devoted to finite-state methods, and half to context-free methods (or beyond). Algorithms for annotating linguistic structure will always be presented with statistical variants, which provide the basis for disambiguation.

Date Topic Reading AssignmentFAQs
Sep.30 Introduction to NLP; Applications using NLP; Chomsky hierarchy      
Oct.2 Weighted finite-state automata and transducers; n-grams and smoothing   HW1 FAQ1
Oct.7 Regular expression processing and data structures for parsing and tagging      
Oct.9 N-grams and smoothing (cont.); grammar and lexicon composition      
Oct.14 Finite-state Morphology; Phonology and Pronunciation      
Oct.16 POS-tagging and Class-based language modeling; midterm cheat sheet   HW2  
Oct.21 Implementing dynamic programs for a Markov chain tagging model      
Oct.23 Midterm Exam      
Oct.28 Machine learning for sequence processing: supervised and unsupervised methods   extra credit  
Oct.30 Context-free grammars (CFGs); treebanks; probabilistic CFGs      
Nov.4 Context-free parsing (introduction); CYK   HW3  
Nov.6 Context-free parsing (continued); Earley, Shift-reduce, Top-down, Left-corner; grammar transformations      
Nov.11 Context-sensitive grammars; unification; Tree-adjoining grammar; Categorial grammar      
Nov.13 Advanced statistical approaches to context-free parsing   HW4  
Nov.18 Dependency parsing and finite-state parsing      
Nov.20 Word sense disambiguation; Anaphora resolution; NP coreference; Semantic role labeling      
Nov.25 Lexical semantics and word statistics   HW5  
Dec.2 Applications of structured processing: machine translation      
Dec.4 Discourse structure processing and applications      
Dec.9 Final Exam