Normalizing frequencies

Since different corpora or corpus sections often have different sizes, it is necessary to use frequencies that are normalized to a common base (e.g. per million words, per thousand words) if you want to compare your results.

Example

You searched for the word awesome in the spoken section (1133 occurrences) and the fiction section (658 occurrences) of COCA.

To determine the number of occurrences of awesome per million words, we need to divide the raw frequency by the total number of words in the corpus section and multiply the result by one million.

spoken section:

1133 ÷ 95,565,075 * 1,000,000 = 11.86 occurrences per million words (pmw)

fiction section:

658 ÷ 90,429,400 * 1,000,000 = 7.28 occurrences per million words (pmw)

Video Tutorial

You can find more examples in Video 7 of our screencast series on YouTube which guides you through all steps required in a corpus-linguistic research project.