This course provides an introduction to corpus linguistics, a methodology for studying language based on collections of authentic texts. While the instruction is in English, the concepts and tools introduced are applicable to the study of various languages and students interested in linguistics from outside the Department of English and American Studies are also welcome to attend. The participants will learn the fundamentals of corpus linguistics, from basic principles to slightly more advanced techniques, with a focus on practical application and real-world research. The instructor will enrich the course with real research examples using corpus data, making complex concepts accessible to students. Additionally, students will be guided in creating their own corpora and provide support in utilizing tools like RStudio for analysis. This course aims to equip students with both theoretical knowledge and practical skills in corpus linguistics, fostering critical thinking and research abilities applicable across linguistic disciplines.
Course objectives:
- Understand the principles and methodologies of corpus linguistics.
- Familiarize with prominent corpora and resources in the field.
- Develop skills in corpus compilation, usage and analysis.
- Explore real-world research applications of corpus linguistics.
- Learn to use relevant tools for corpus analysis, including basic statistical methods and software like RStudio.
- Gain hands-on experience in corpus creation and analysis.
Course topics and teaching hours per topic:- Introduction to corpus linguistics: Definition, history, and key concepts (2 hours).
- Well-known corpus resources (CQPWeb, Mark Davies’ corpora, parallel corpora e.g. Intercorp, Paralela) (3 hours)
- Corpus compilation and annotation: Methods for building and structuring corpora (3 hours).
- Tools for corpus analysis: Introduction to software tools like AntConc and R (2,5 hours).
- Basic analysis techniques: Concordancing, collocation, and frequency analysis (3 hours).
- More advanced analysis techniques (correlation testing in R - to show diachronic developments of phenomena, basic statistical tests) (2,5 hours).
- Real-world applications: Corpus-based research in linguistics, language teaching, and beyond (3 hours).
- Practical exercises: Hands-on activities to apply learned concepts, techniques and tools (8 hours, will be spread through the course to let students apply new tools/techniques during each teaching unit; the lecturer will guide and assist students, offer ideas and will be there to help with questions/problems/difficulties of individual students).
- Research projects: Students will present their small-scale research projects using corpus data (3 hours).
Reading: Baker, P. 2023. A year to remember? Introducing the BE21 corpus and exploring recent part of speech tag change in British English. International Journal of Corpus Linguistics 28(3): 407-429. DOI: https://doi.org/10.1075/ijcl.22007.bak
Gonzales, W. D. W., Hiramoto, M., Leimgruber, J. R. E., & Lim, J. J. (2023). The Corpus of Singapore English Messages (CoSEM). World Englishes, 42(2), 371–388.
Gries, S. Th. 2013.
Statistics for Linguistics with R. Berlin/Boston: De Gruyter.
Hundt, M. and Leech, G. 2012. Small is beautiful: On the value of observing recent grammatical change. In The Oxford handbook of the history of English, Terttu Nevalainen and Elizabeth Closs Traugott (eds.), 175-88. Oxford: Oxford University Press.
Lindquist, Hans. 2009. Corpus Linguistics and the Description of English. Edinburgh Textbooks on the English Language: Advanced. Edinburgh: Edinburg University Press.
Rudnicka, K. 2019. The Statistics of Obsolescence: Purpose Subordinators in Late Modern English. NIHIN Studies. Freiburg: Rombach.
Rudnicka, K. 2024. Non-verbal plural number agreement – a pilot study comparing English and German using Oslo Multilingual Corpus data. Slovo a slovesnost 85(1): 27–54.