Quantitative Corpus Linguistics with R

Instructor(s): Stefan Th. Gries

Description:
Corpus data, while already well-established, are used more and more widely both in theoretical linguistics, but also in historical, socio-, and psycholinguistics, a development that has been supported by the recognition of the role frequency information plays for, say, language change, acquisition, and processing. However, ready-made corpus tools are often not flexible enough to handle differently formatted corpora and/or perform the often diverse operations required for some study. This course is an introduction to data retrieval, organization, and statistical evaluation in corpus linguistics using the programming language R. The course will introduce the main basic data structures for text processing in R, regular expressions (from simple wildcards via character classes up to backreferences and lookaround), as well as elementary programming constructs (loops and conditional expressions) and teach participants to develop scripts for their own research projects; examples from different subdisciplines of linguistics will be discussed.

Prerequisites:
Courses: None
Skills: Basic knowledge of corpus linguistic notions and ideas;good familiarity with Windows operating system and spreadsheet

Course ID:
LING7800-060

Days/Times:
Mon & Thu 8:30-10:15

Classroom: ECCH 107

Areas of Linguistics:
Computational Linguistics