Corpora of Present-Day British English
- Lancaster-Oslo/Bergen Corpus of British English (LOB), 1 million words
- Freiburg LOB Corpus of British English (FLOB), 1 million words
- London-Lund Corpus of Spoken English (LLC), 1 million words
- British National Corpus (BNC), 100 million words
- The Daily Mail and The Mail on Sunday 1993-2000, 198 million words
- The Daily Telegraph and The Sunday Telegraph 1991-2000, 2002, 2004, 422 million words
- The Guardian and The Observer 1990-2005, 629 million words
- The Independent 1992-1994, 2002-2005
- The Times and The Sunday Times 1990-2004, 709 million words
Corpora of Present-Day American English
- Standard Corpus of Present-Day Edited American English (BROWN), 1 million words
- Freiburg BROWN Corpus of American English (FROWN), 1 million words
- American National Corpus (ANC), 12.5 million words
- Corpus of Contemporary American English (COCA), offline data, 440 million words
- Corpus of Spoken Professional American English (CSPAE), 2 million words
- Switchboard corpus of American telephone conversations
- The Buckeye Corpus of conversational speech, 300,000 words
- The Denver Post, 32 million words
- The Detroit Free Press 1992-1995, 91 million words
- The Los Angeles Times 1992-1995, 550 million words
- The New York Times 2001, 52 million words
- The Washington Times and Insight on the News 1990-1992, 84 million words
- Time Almanac 1920s-1990s 3.4 million words, 1989-1994 11 million words
Historical Corpora
- Early English Prose Fiction (EEPF)
- 18th Century Fiction (ECF)
- 19th Century Fiction (NCF)
- Changing Times 1785-1985, 11.5 million words
- Chaucer Corpus, 500,000 words
- Corpus of Historical American English (COHA), offline data, 377 million words
- Early American Fiction
- Lampeter Corpus
- Oxford English Dictionary, Version 1.13
- Penn-Helsinki Parsed Corpus of Middle English, 1.3 million words
- The Bible in English
- The Helsinki Corpus of English Texts
Other Varieties of English
- Australian Corpus of English (ACE)
- Corpus of Global Web-Based English (GloWbE), offline data, 1.8 billion words
- Kolhapur Corpus of written Indian English, 1 million words
- Wellington Corpus of Written New Zealand English (WC)
- Wellington Corpus of Spoken New Zealand English (WSC)
German Corpora
- Die Zeit, 25.5 million words
- TAZ 1986-1999, 170.5 million words
Corpora of Language in Politics
- Corpus of Political Speeches (CORPS I), 2.3 million words
- Corpus of Political Speeches (CORPS Release II), 8 million words
- Hansard Reports, House of Commons 1991, 1992
- Hansard Reports, House of Lords Hansard 1992
- Political Tweets of Trump and US-Senators (PoTTUS). 25 million words. More information about the corpus
Learner Corpora
Collections of texts produced by foreign/second language learners.
Apart from their role as a resource for second language acquisition research, they can be used to identify typical difficulties of learners of a certain learner group (e.g. intermediate learners) or learners of a certain native language (e.g. German learners of English).
- International Corpus of Learner English, Version 2 (ICLEv2). Essays written by upper intermediate and advanced learners of English. More information about the corpus
- Louvain International Database of Spoken English Interlanguage (LINDSEI). Oral data produced by advanced learners of English from several mother tongue backgrounds. More information about the corpus
- LOCNESS. Student essays by native speakers of English. More information about the corpus
Other Corpora
- Child Language Data Exchange system (CHILDES), 21 million words
- International Computer Archive of Modern and Medieval English (ICAME Collection of Corpora)
- ICAME Collection of Corpora II