|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fall 2005 Advances in technology have revolutionized the way linguists approach their data. Using computers, extremely large bodies of text ("corpora") can be collected and analyzed at a level of detail that only a generation ago would have been unthinkable. For linguists and computer scientists alike, the accelerating growth of the World Wide Web and other natural language resources have made techniques for dealing with very large texts more important than ever. Through a combination of lectures, demonstrations, and hands-on exercises, this course will give students an introduction to the skills necessary for computer-aided text manipulation. Students will learn to construct and search text databases using Unix tools, to write python programs to manipulate large natural language corpora, and to use statistical software to perform quantitative analysis of linguistic data. InstructorRob Malouf RequirementsThe final grade will be based on homework assignments (30%), a midterm project (30%), and a final project (40%). Through the term, there will be five hands-on homework assignments in which students apply the techniques learned in class to actual corpus materials. Since it's important to not get behind on assignments, late assignments will be accepted for partial credit for one week only after the due date unless prior arrangements are made. Working in groups is encouraged, but please include the names of all coworkers on the assignment. The final project should be a program (with documentation) to perform some substantial corpus processing task. Alternatively, the final project can be the collection and annotation of a new corpus. More details about both projects will be given later in the term. ReadingsThere are two required textbooks for this course: Alan Gauld. 2001. Learn to Program Using Python. Addison Wesley. and Jon Lasser. 2000. Think Unix. Pearson Education. Both of these books are available in the campus bookstore. In addition, you might find it useful to have a comprehensive Python reference manual, such as: David M. Beazley. 2001. Python Essential Reference. Second Edition. New Riders. This should be easy to find at local or on-line bookstores. Additional readings will be made available in class or via the "Resources" section of the course web page. Schedule Week 1–2 Introduction Week 3–5 Text manipulation with Unix Week 6–11 Python Week 12–14 Quantitative linguistics Week 15 Future prospects Lectures
Links
rmalouf@mail.sdsu.edu
|