Friday, 2 August 2013

Corpora and the advanced level: problems and prospects

Michael McCarthy
This was the title of a recent Cambridge English Teacher webinar presented by Michael McCarthy. What follows is a summary of what he had to say.

Key issues at the advanced level

Beginner level English is easy.  Students need to know basic vocabulary and grammar.  Once you get up to upper-intermediate and advanced levels (B2 - C1 - C2), though, it gets difficult!  There really isn't much consensus about what we should teach at advanced level, but evidence from corpus can help us decide.

Once you get beyond the 2000 or so most common words, vocabulary becomes a vast catalogue of low-frequency items, so how do we know which words to teach?  Grammar loses its sense of progression and tends to be a rag-bag of difficult and arcane items.  How do we bring a sense of usefulness to the grammar at this level?

Assessment targets become more difficult to distinguish at higher levels.  For example, fluency:

  • B2 - fluent
  • C1 - very fluent
  • C2 - extremely fluent
What does this mean?  How do we judge it?  Lower level learners get lots of opportunities to show their level in exams.  Higher level students perhaps don't.


The English Profile programme uses corpora to answer questions about vocabulary and grammar.  It is available online here.

The main problem is, if we just teach and learn new words as they come up, we find that these words give back less and less.  The first words we learn give great text coverage, but as we learn more words, the return reduces as they are less common.

At more advanced levels, collocations and language chunks become much more prominent.  The same words appear in more and different combinations. Register, connotation and style become more important.  There is more specialised vocabulary and subtle, evaluative nuances of adjectives, for example, need to be explained.  There is also a growth in domain-specificity - vocabulary particular to specific disciplines.

There is evidence of possible slowdown and attrition at higher levels, too. The pace of learning slows and students may even reach a plateau and stop developing completely.

The English Vocabulary Profile gives labels for words and phrases and can be browsed by CEFR level or by the vocabulary item itself.  This helps teachers to know what vocabulary is most useful for students to learn at a particular level. It also gives students a progression if they focus on the words they need to know at each level.

Grammatical issues

At higher levels we can focus on:
  • New or not typically taught functions for known forms.  For example, we can teach the uses of present perfect that we haven't had time to cover at lower levels.
  • Low-frequency patterns - structures that are still used by native or proficient speakers, but not often.
  • Patterns that underlie academic success - grammatical structures that help students to score well in exams.
Example - Future Perfect (Continuous)

The common usage which we teach at intermediate level is - At the end of this year, I will have been living in Vietnam for three years.  However, if we look at corpus, we can find another common use for this tense:

You'll have heard about the terrible earthquake.
You'll have been given a handout.

Here, future perfect is used to make assumptions about the present - things that have already happened!

Look at these examples from the corpus of present perfect continuous being used in the same way:
At higher levels, we should find and teach examples like this.  We need to show our students different functions of grammar which is already known. They will be able to use them in their speaking and writing, and recognise the meaning when they hear or read them.

Example - Subjunctive Patterns

Here, we are talking about instances where the verb is always in the base form.  For example,

They insist that he wear his uniform at all times.
......their insistence that he wear his uniform ...... important that he wear his uniform .......

verb/noun/adjective + that + subject + base form of the verb

We can look at a corpus and see how the subjunctive is used.  Although it's not so common, it's useful and students think they're making progress when they learn about it.

Look at these examples taken from English Profile:

We can teach these structures as a piece of grammar and link it with vocabulary by using English Profile.

The power of the corpus is that it can give coherence and purpose to syllabi at higher levels.

Example - Nominalisation

Here, we are talking about the process of turning a verb into a noun.  

We fly at seven.    >        Our flight is at seven. 
Mr X donated Y.    >        Mr X made a donation of Y.

It is seen as a sign of good academic writing.  This can be confirmed by looking at the Cambridge Learner Corpus which takes examples from students' work and researches what grammar structures and vocabulary attract the highest marks.

Example - Modality

Analysis of success at higher levels indicates that the use of adverbs after modal verbs is good!


The corpus is relevant and current - results from it can be put straight into teaching materials.


  • What is it that remains to be learned at higher levels?
  • How can the corpus help us to decide what must be taught and how to teach it?
Students can't learn every word in the English language, so tell them to concentrate on the words that interest them.  By reading texts that interest them, their general vocabulary will improve and they will make progress - FACT!!


  1. Where is a 'like' button when you need one?!

  2. Vocabulary is very important in Language Learning. If you don’t know the vocabulary of a language, you can neither speak it nor write it. So knowing vocabulary is knowing almost 70 percent of language.
    Learn it only through