![]() |
Linguistics 581
Introduction to Computational Linguistics
Course Description |
This course will serve as an introduction to the field of computational linguistics, which includes aspects of speech recognition, natural language processing, information retrieval, and information extraction.
The course begins with an introduction to finite-state automata and some basic natural language applications; this is extended to finite-state transducers with applications in morphology (word structure). Other topics covered: ngram language models, classifiers (Naive Bayes and Logistic Regression), sentiment analysis, part of speech tagging, context-free grammars and context-free parsing (with statistical extensions), and dstributuional semantics.
Goals |
The primary goal of the course is to acquaint students with a basic set of computational techniques that have proved useful in a variety of natural language applications. The principles and mathematics behind these techniques often overlap with those used in other fields in which machine learning has been successfully applied, such as computer vision. However, the problems, in particular the relevant statistical properties, are quite different. Thus this class should provide a nice complement to other classes you may be taking which use machine learning. Students should acquire enough facility with the concepts and tools so that they can use them to construct well-specified solutions to simple computational linguistic problems. A well-specified solution is one that a programmer can use to write a program. |
---|---|
Practice |
The course will use the textbook:
There will be exercises for most of the chapters covered. |
Programming |
The programming required in this class is very light. We will gloss over many of the difficult details involved in implementing the ideas covered here. However, some use of computation is essential. The programming language used will be Python. Computational assignments will be guided, with data, tool, and partial solutions provided. The tools and partial solutions provided will all be in Python.
|
Pre-requisites | At least two linguistics courses or at least two programming or CS courses. Students with no programming background will find this course challenging. |
Grading |
Grading will be based on exercises and
take-home midterms and finals.
|
Late Assignments |
The general structure of the course is not well-suited to
late assignments. Assignment solutions will be discussed
in detail on the day they are turned in, and thus students
who turn assignments in late will be at an advantage.
However, to allow for some
flexibility, late assignments will receive partial
credit. Here is the lateness policy:
|
Group Work |
Group work is encouraged on the assignments. The midterm and final should be completed without any help. To be clear, collaboration on either the midterm or final will be considered cheating. When turning in collaborative assignments, your collaborators should be identified on your paper. |
Attendance |
Attendance is not a formal part of your grade. However, be aware that hints on how to solve problems on the assignments, the midterms, and the final are handed out liberally in class. These hints will not be posted on the web page. |
Classroom Practice |
Assignments will generally be due on a Tuesday and will be discussed upon return. Model solutions will often be posted. Most of the readings are from the 2nd Edition of your textbook, but some are available ONLY in the 3rd Edition, which has not yet been published. |
Course outline |
here. |
Place and Time |
Tu Th 200-315 LSS 246
Contact |
Mailing address:
Department of Linguistics and Oriental Languages
San Diego State University
5500 Campanile Drive
San Diego, CA 92182-7727
Telephone: (619) 594-0252
Office location: SHW, room 238
Office hours: Tu 3:30-4:30, Th 9:30-10:45, TuTh 12:30-1:45