This was the title of the closing plenary at this year's VUS-TESOL conference, given by Professor Timothy L. Farnsworth. What follows is a summary of what he had to say.
What is automated scoring?
1. Natural Language Processing (NLP)
What does E-rater not notice?
Criterion
This is an E-rater application designed for in-class use. Students' essays are instantly scored using E-rater software. Students are given individual scores and extra resources to refer to about their errors.
Versant
This is the first fully automated oral language test used commercially. It is a Pearson product. The test is taken in a computer lab or over the phone (speaking to a computer). The computer automatically rates the speech and produces scores. It is used widely in business and increasingly in schools. There are many versions with multiple uses and languages - for the aviation industry, for example.
The test is fifteen minutes long and includes:
What is a good Versant response?
What Versant doesn't measure:
Reliability
Problems
Automated tests can be 'gamed' or tricked. Versant scores, for example, can be quickly raised by coaching.
Positive effects on teaching
Despite the obvious drawbacks, computer scored testing is in all our futures.
What is automated scoring?
- Computer software that automatically assigns scores to writing or speaking samples.
- Essays can be assigned scores instantly by computer.
- Test takers can call a testing centre and take an oral test without speaking to a human.
- Scores can be reported instantly.
- Some level of feedback is given to test takers.
- There is a variety of software available.
1. Natural Language Processing (NLP)
- software identifies and counts linguistic features.
- software does not attempt to gauge content in any way.
- used for testing writing.
- software compares the speech sample to a large database of samples of the same test questions.
- faster responses are 'more fluent'.
- used for testing speaking.
- automated scoring of timed essays
- uses NLP
- currently used in a limited way to rate TOEFL and GRE
- used for formative assessment (e.g. TOEFL practice online)
- individual assessment
- students submit essays, receive scores and re-write them as many times as they want in order to improve their score
- the number of words
- the number of sentences
- the number of paragraphs
- sentence length
- the number of unique words used versus the total number of words (lexical diversity)
- the number of low-frequency words (lexical depth)
- the number of prompt-specific words (topic appropriateness)
- dependent/independent clauses
- passive voice
- subject-verb agreement
- plurals
- sequencing words
- logical relations
- mechanics (punctuation, for example)
- It's long - longer is always better!
- It has a standard structure.
- It has many longer sentences with a lot of dependent clauses.
- It has many explicit organisational words.
- It has a lot of obscure vocabulary - for example, indubitably would score much higher than surely!
- It has a wide range of vocabulary.
What does E-rater not notice?
- Untruths
- Grammatical errors
- Lexical errors
- Flawed arguments
- Insanity!
Criterion
This is an E-rater application designed for in-class use. Students' essays are instantly scored using E-rater software. Students are given individual scores and extra resources to refer to about their errors.
Versant
This is the first fully automated oral language test used commercially. It is a Pearson product. The test is taken in a computer lab or over the phone (speaking to a computer). The computer automatically rates the speech and produces scores. It is used widely in business and increasingly in schools. There are many versions with multiple uses and languages - for the aviation industry, for example.
The test is fifteen minutes long and includes:
- repeating sentences
- scrambled sentences
- oral multiple choice
- fluency
- pronunciation
- sentence mastery
- vocabulary
- grammar
What is a good Versant response?
- It's fast (fluency score)
- It's clear
- It's accurate
- It has native-like pronunciation
What Versant doesn't measure:
- the range of vocabulary used
- extended speaking
- pragmatics - cultural awareness, for example
- the ability to interact with others
Reliability
- computers don't get tired
- computers aren't biased for or against individuals
- scores are more consistent than with human raters
- it's less expensive than using human raters
- scores and feedback are obtained instantly
Problems
Automated tests can be 'gamed' or tricked. Versant scores, for example, can be quickly raised by coaching.
Positive effects on teaching
- Students can get more and faster feedback.
- The form of the test can influence what happens in the classroom.
- Teachers tend to focus on what is tested at the expense of communicative teaching.
- There can be a decreased focus on the quality of the content.
- There can be an increased focus on grammatical accuracy and low-frequency vocabulary.
- There is more oral repetition in order to increase the students' speed of response.
- There is less time spent on developing critical thinking.
- There is a decreased focus on the pragmatic.
Despite the obvious drawbacks, computer scored testing is in all our futures.
Hi,
ReplyDeleteThe site is about automated scoring software and its application in ELT contexts, Golf Scoring Software offers facilities for organizations that prefer to collect data within the confines of their own software system. Also, you can get facilities to collect survey data as part of a larger system for measuring outcomes.Thanks....
I agree with all of the points about software keep up the good work.Thanks for sharing this.
ReplyDeleteThin Client Software & RDP Thin Client
Hi,
ReplyDeleteThe site is about scoring software, Online Golf Handicap helps you to get result faster and correct without any problem. So, many people take interest to use this in their clubs , thanks...