Multilingual Linguistic Annotation

Instructor(s): Nianwen Xue

Linguistic annotation has been the fuel that drives the field of Natural Language Processing (NLP) that increasingly relies on large volumes of natural language data annotated with rich linguistic information. It has also been gaining ground as a tool for expanding the empirical basis for theoretical linguistic research. This course is designed to help students gain a solid understanding of the general principles and methodologies in the development of high-quality linguistically annotated data that are of practical value to NLP and/or theoretical linguistic research.

The course will consist of case studies of the most influential linguistic annotation projects in the fields of NLP and linguistics. The types of annotation covered include syntactic structure, semantic roles, word senses, discourse relations, temporal relations, as well as sentiments and opinions. Annotated corpora from multiple languages (primarily English and Chinese) will be presented, but no prior knowledge of any language other than English will be assumed. The relevant language-specific elements will be introduced as each topic is covered. The course will also include demonstrations of annotation tools and the student will have a chance to perform linguistic annotation first-hand.

Syntactic theory, Lexical semantics

