Corpus Linguistics

What is corpus linguistics?

Corpus linguistics is a methodology that involves computer-based empirical analyses (both quantitative and qualitative) of language use by employing large, electronically available collections of naturally occurring spoken and written texts, so-called corpora.

Corpus-based studies and other empirical research have shown that speakers' intuitions oftentimes provide only limited access to the open-ended nature of language, which can cause problems when examining infrequent linguistic structures, e.g. lexical co-occurrence patterns, patterns of variation between grammatical constructions, word meaning, or idioms and metaphorical language.

Corpus linguistics and language variation

Which factors condition the choice between competing grammatical variants is one topic that features prominently in our research as well as in students' projects at Mainz University.

  • While grammar books make us believe that, e.g. yet is a trigger of the present perfect, we see the phrase "Did you vote yet?" used in U.S. election campaigns.
  • While standard reference works used in schools advise students to use the synthetic comparative in -er with monosyllabic adjectives, we observe native speakers using more apt or more proud rather than prouder and apter in the majority of cases.
  • While the 's-genitive is described as being used with persons while the of-genitive is allegedly used with things, linguists studying actual language use find a marked discrepancy between what is taught and what is done. Thus, the topic's relevance cannot be stigmatized as an exception or even be marked as incorrect.

Corpus linguistics at Mainz University

The issue of variation poses an intriguing challenge for English teachers and researchers. While to some the task of bringing schoolbook knowledge up to scratch with actual language use seems insurmountable, English Linguistics at Mainz University tries to offer ways out of the dilemma.

  • In most English linguistics classes in Mainz students practice the collection, processing and analysis of empirical data, often by making use of corpora.
  • In advanced classes in particular, students will be asked to carry out corpus-based projects, sometimes involving replications and extensions of earlier case studies.
  • The Department of English and Linguistics offers its students a wide range of computerized corpora comprising British and American English. The Mainz Corpus Collection MACOCO, is a continuously growing source for student research on the grammaticality, use, and historical development of language structures.

What are possible applications of corpus-based research?

  • Foreign language teaching: Materials and syllabus design, exams testing language competence, and teaching methods
  • Corpus information is extensively used in lexicography: Almost all monolingual learner dictionaries are now corpus-based, e.g. the Longman Dictionary of Contemporary English
  • Corpus-based reference and student grammars of English:
    • Biber, Douglas et al. (1999) Longman Grammar of Spoken and Written English. London: Longman.
    • Biber, Douglas et al. (2002) Longman Student Grammar of Spoken and Written English. London: Longman.
    • Huddleston, Rodney and Geoffrey K. Pullum (2005) A Student's Introduction to English Grammar. Cambridge: CUP.
    • Huddleston, Rodney and Geoffrey K. Pullum, eds. (2002) The Cambridge Grammar of the English Language. Cambridge: CUP.

Further reading

  • Biber, Douglas et al. (1998) Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP.
  • Hoffmann, Sebastian et al. (2008) Corpus Linguistics with BNCweb - a Practical Guide. Frankfurt/Main: Peter Lang.
  • Lemnitzer, Lothar & Zinsmeister, Heike (2006) Korpuslinguistik. Eine Einführung. Tübingen: Narr.
  • McEnery, Tony & Wilson, Andrew (²2001) Corpus Linguistics. Edinburgh: Edinburgh University Press.
  • McEnery, Tony, Yukio Tono & Xiao, Richard (2006) Corpus-based Language Studies: An Advanced Resource Book. London: Routledge.
  • Mukherjee, Joybrato (2009) Anglistische Korpuslinguistik. Eine Einführung. Berlin: Erich Schmidt.
  • Partington, Alan (2001) "Corpora and their use in language research". In: Aston, Guy (ed.), Learning with Corpora. Bologna: CLUEB: 46-62.
  • Scherer, Carmen (2006) Korpuslinguistik. Eine Einführung. Heidelberg: Winter.
  • List of corpora