San Diego State University logo

Computational Linguistics Program

Linguistics 581

Introduction to Computational Linguistics

    Course Description

    This course will serve as an introduction to the field of computational linguistics, which includes aspects of speech recognition, natural language processing, information retrieval, and information extraction.

    The course begins with an introduction to finite-state automata and some basic natural language applications; this is extended to finite-state transducers with applications in morphology (word structure). Other topics covered: ngram language models, classifiers (Naive Bayes and Logistic Regression), sentiment analysis, part of speech tagging, context-free grammars and context-free parsing (with statistical extensions), and dstributuional semantics.

    Goals

    The primary goal of the course is to acquaint students with a basic set of computational techniques that have proved useful in a variety of natural language applications. The principles and mathematics behind these techniques often overlap with those used in other fields in which machine learning has been successfully applied, such as computer vision. However, the problems, in particular the relevant statistical properties, are quite different. Thus this class should provide a nice complement to other classes you may be taking which use machine learning.

    Students should acquire enough facility with the concepts and tools so that they can use them to construct well-specified solutions to simple computational linguistic problems. A well-specified solution is one that a programmer can use to write a program.

    Practice

    The course will use the textbook:

    * Jurafsky, Daniel and Martin, James H. 2000. Speech and Language Processing. Prentice-Hall. (2nd Edition only!)

    There will be exercises for most of the chapters covered.

    Programming

    The programming required in this class is very light. We will gloss over many of the difficult details involved in implementing the ideas covered here. However, some use of computation is essential.

    The programming language used will be Python. Computational assignments will be guided, with data, tool, and partial solutions provided. The tools and partial solutions provided will all be in Python.

    Pre-requisites At least two linguistics courses or at least two programming or CS courses. Students with no programming background will find this course challenging.
    Grading Grading will be based on exercises and take-home midterms and finals.
    • Takehome Midterm 20%
    • Takehome Final 30%
    • Exercises: 50%
    Late
    Assignments
    The general structure of the course is not well-suited to late assignments. Assignment solutions will be discussed in detail on the day they are turned in, and thus students who turn assignments in late will be at an advantage. However, to allow for some flexibility, late assignments will receive partial credit. Here is the lateness policy:
    • Up to one week late: 50% credit for assignment
    • More than one week late: not accepted
    Group Work

    Group work is encouraged on the assignments. The midterm and final should be completed without any help. To be clear, collaboration on either the midterm or final will be considered cheating.

    When turning in collaborative assignments, your collaborators should be identified on your paper.

    Attendance

    Attendance is not a formal part of your grade.

    However, be aware that hints on how to solve problems on the assignments, the midterms, and the final are handed out liberally in class. These hints will not be posted on the web page.

    Classroom
    Practice

    Assignments will generally be due on a Tuesday and will be discussed upon return. Model solutions will often be posted.

    Most of the readings are from the 2nd Edition of your textbook, but some are available ONLY in the 3rd Edition, which has not yet been published.

    Course
    outline
    here.

    Place and Time

    Tu Th 200-315 LSS 246

    Contact

    Mailing address:
    Department of Linguistics and Oriental Languages
    San Diego State University
    5500 Campanile Drive
    San Diego, CA 92182-7727
    Telephone: (619) 594-0252
    Office location: SHW, room 238
    Office hours: Tu 3:30-4:30, Th 9:30-10:45, TuTh 12:30-1:45