Corpus Linguistics

Corpus linguistics is a methodology in linguistics that involves computer-based empirical analyses (both quantitative and qualitative) of actual patterns of language use by employing electronically available, large collections of naturally occuring spoken and written texts, so-called corpora. Corpus-based and other types of empirical linguistic research have shown that speakers' intuitions oftentimes provide only limited access to the open-ended nature of language, which can cause problems when examining unexpected or infrequent linguistic structures, e.g. as regards lexical co-occurrence patterns, patterns of variation between grammatical constructions, word meaning, or idioms and metaphorical language.

The factors that condition the choice between competing grammatical variants is one topic that features prominently in research and students' projects at Mainz University. While grammar books make us believe that e.g. yet is a trigger of present perfect, we can observe U.S. election campaigns featuring the sentence "Did you vote yet?". While standard reference works used by school teachers advise pupils to use the synthetic comparative -er with monosyllabic adjectives, we observe native speakers to use more apt, more proud rather than prouder, apter in the majority of cases. While the 's-genitive is described as being used with persons while the of-genitive is allegedly to be used with things, linguists who do research on actual language use find a marked discrepancy between what is taught and what is done. Thus, the topic's relevance cannot be stigmatized as an exception or even be marked as incorrect. The issue of variation poses an intriguing challenge for English teachers and researchers. While to some the task of bringing schoolbook knowledge up to scratch with actual language use seems insurmountable, English Linguistics at Mainz University tries to offer ways out of the dilemma.

Most (advanced) English linguistics classes in Mainz involve at some point students' own collection, processing and analysis of empirical data, often by making use of electronic corpora. In advanced classes in particular, students will be asked to carry out corpus-based projects, sometimes involving replications and extensions of earlier case studies. The Department of English and Linguistics hence offers its students a wide range of computerized corpora comprising British and American English. MACOCO, the Mainz Corpus Collection, is a progressively enhanced source for student research on the correctness, use, historical development, etc. of certain language structures.

Examples for research projects with electronic corpora as research tools

  • Investigating near-synonymous words (sick vs. ill)
  • Word-forming elements and how their use can be related to changes in society and culture, historical events or fashion (-dom as in kingdom, -nik as in peacenik, -thon as in sleepathon, -gate as in nipplegate)
  • Changes (of preferences) in language use such as the rise and fall of words and phrases, or changes within grammatical constructions (help + to infinitive as in I helped him to carry the boxes vs. help + bare infinitive as in I helped him _ carry the boxes)

Applications of corpus-based research

  • Foreign language teaching: Materials and syllabus design, language testing, and classroom methodology
  • Corpus information is extensively used in lexicography: Almost all monolingual learner dictionaries are now corpus-based, e.g. the Longman Dictionary of Contemporary English
  • Corpus-based reference and student grammars of English:
    • Biber, Douglas et al. (1999) Longman Grammar of Spoken and Written English. London: Longman.
    • Biber, Douglas et al. (2002) Longman Student Grammar of Spoken and Written English. London: Longman.
    • Huddleston, Rodney and Geoffrey K. Pullum (2005) A Student's Introduction to English Grammar. Cambridge: CUP.
    • Huddleston, Rodney and Geoffrey K. Pullum, eds. (2002) The Cambridge Grammar of the English Language. Cambridge: CUP.

» Selected readings: Corpus Linguistics

  • Biber, Douglas et al. (1998) Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP.
  • Hoffmann, Sebastian et al. (2008) Corpus Linguistics with BNCweb - a Practical Guide. Frankfurt/Main: Peter Lang.
  • Lemnitzer, Lothar & Zinsmeister, Heike (2006) Korpuslinguistik. Eine Einführung. Tübingen: Narr.
  • McEnery, Tony & Wilson, Andrew (²2001) Corpus Linguistics. Edinburgh: Edinburgh University Press.
  • McEnery, Tony, Yukio Tono & Xiao, Richard (2006) Corpus-based Language Studies: An Advanced Resource Book. London: Routledge.
  • Mukherjee, Joybrato (2009) Anglistische Korpuslinguistik. Eine Einführung. Berlin: Erich Schmidt.
  • Partington, Alan (2001) "Corpora and their use in language research". In: Aston, Guy (ed.), Learning with Corpora. Bologna: CLUEB: 46-62.
  • Scherer, Carmen (2006) Korpuslinguistik. Eine Einführung. Heidelberg: Winter.

» Selected links (external)