World Teacher: The validity of automated scoring software and its application in ELT contexts

This was the title of the closing plenary at this year's VUS-TESOL conference, given by Professor Timothy L. Farnsworth. What follows is a summary of what he had to say.

What is automated scoring?

Computer software that automatically assigns scores to writing or speaking samples.
Essays can be assigned scores instantly by computer.
Test takers can call a testing centre and take an oral test without speaking to a human.
Scores can be reported instantly.
Some level of feedback is given to test takers.
There is a variety of software available.

How does a computer grade a test?

1. Natural Language Processing (NLP)

software identifies and counts linguistic features.
software does not attempt to gauge content in any way.
used for testing writing.

2. Speech recognition

software compares the speech sample to a large database of samples of the same test questions.
faster responses are 'more fluent'.
used for testing speaking.

E-rater (ETS)

automated scoring of timed essays
uses NLP
currently used in a limited way to rate TOEFL and GRE
used for formative assessment (e.g. TOEFL practice online)
individual assessment
students submit essays, receive scores and re-write them as many times as they want in order to improve their score

E-rater takes an essay and counts:

the number of words
the number of sentences
the number of paragraphs
sentence length
the number of unique words used versus the total number of words (lexical diversity)
the number of low-frequency words (lexical depth)
the number of prompt-specific words (topic appropriateness)

The computer doesn't try to understand the essay, but it does look at grammar:

dependent/independent clauses
passive voice
subject-verb agreement
plurals
sequencing words
logical relations
mechanics (punctuation, for example)

What is a good essay according to E-rater?

It's long - longer is always better!
It has a standard structure.
It has many longer sentences with a lot of dependent clauses.
It has many explicit organisational words.
It has a lot of obscure vocabulary - for example, indubitably would score much higher than surely!
It has a wide range of vocabulary.

This is not necessarily a good thing! Good English writing is often simple, clear and concise.

What does E-rater not notice?

Untruths
Grammatical errors
Lexical errors
Flawed arguments
Insanity!

Therefore, ETS doesn't use E-rater as the sole scorer for tests. Rather, it is used as the second human in order to save money. More than ten years of research hasn't solved the problems with E-rater - it's incredibly hard to get a computer to understand language!

Criterion

This is an E-rater application designed for in-class use. Students' essays are instantly scored using E-rater software. Students are given individual scores and extra resources to refer to about their errors.

Versant

This is the first fully automated oral language test used commercially. It is a Pearson product. The test is taken in a computer lab or over the phone (speaking to a computer). The computer automatically rates the speech and produces scores. It is used widely in business and increasingly in schools. There are many versions with multiple uses and languages - for the aviation industry, for example.

The test is fifteen minutes long and includes:

repeating sentences
scrambled sentences
oral multiple choice

All responses are totally scripted with only one possible right answer. There is an optional 'free response' answer, but this is not scored. Answers are scored on:

fluency
pronunciation
sentence mastery
vocabulary
grammar

Speech is captured by microphone and compared to a large database of human-scored responses. The database includes responses from native speakers from different countries, and English learners from different countries and of all proficiency levels. Scores are given in the range of 'most similar' to the sample.

What is a good Versant response?

It's fast (fluency score)
It's clear
It's accurate
It has native-like pronunciation

This last criteria is the most contentious. We talk about 'global English' now and, for most of us, comprehensibility is much more important than native-like speech.

What Versant doesn't measure:

the range of vocabulary used
extended speaking
pragmatics - cultural awareness, for example
the ability to interact with others

Advantages of these systems

Reliability

computers don't get tired
computers aren't biased for or against individuals
scores are more consistent than with human raters

Practicality

it's less expensive than using human raters
scores and feedback are obtained instantly

Research shows that when test takers are 'acting in good faith', scores are roughly equivalent to those of human raters. Even though the scores are very similar, however, they are arrived at in very different ways.

Problems

Automated tests can be 'gamed' or tricked. Versant scores, for example, can be quickly raised by coaching.

Positive effects on teaching

Students can get more and faster feedback.

Negative effects on teaching

The form of the test can influence what happens in the classroom.
Teachers tend to focus on what is tested at the expense of communicative teaching.
There can be a decreased focus on the quality of the content.
There can be an increased focus on grammatical accuracy and low-frequency vocabulary.
There is more oral repetition in order to increase the students' speed of response.
There is less time spent on developing critical thinking.
There is a decreased focus on the pragmatic.

To conclude

Despite the obvious drawbacks, computer scored testing is in all our futures.

3 comments:

Anonymous21 October 2013 at 07:29
Hi,

The site is about automated scoring software and its application in ELT contexts, Golf Scoring Software offers facilities for organizations that prefer to collect data within the confines of their own software system. Also, you can get facilities to collect survey data as part of a larger system for measuring outcomes.Thanks....
Unknown11 February 2014 at 07:43
I agree with all of the points about software keep up the good work.Thanks for sharing this.

Thin Client Software & RDP Thin Client
Anonymous11 March 2014 at 07:48
Hi,

The site is about scoring software, Online Golf Handicap helps you to get result faster and correct without any problem. So, many people take interest to use this in their clubs , thanks...

Tuesday, 8 October 2013

The validity of automated scoring software and its application in ELT contexts

3 comments: