The movement for holistic scoring was developed to counter then-current practices of evaluating student writing by employing multiple choice tests. ETS argued that the valid measurement of writing ability should include requiring students to write. In addition, believing that testing forms curricular practices, many writing teachers argued that multiple-choice tests of writing sent students (and faculty) the wrong message about writing instruction.
However, direct assessment of student writing appeared to be much more expensive than using multiple choice tests if all that was considered was scoring. Edward White, in Teaching and Assessing Writing, points out that when the costs of test-creation are included, there is no cost saving in using multiple-choice tests. In fact, if readers could be trained to read quickly, looking at the piece of writing a whole unit, holistic writing assessment could compete quite favorably with multiple-choice tests. Some people argued that when the benefits to instructors were included, holistic assessment might include even greater benefits. The only drawback was demonstrating that scoring was reliable.
Until the development of holistic scoring, no reliable way existed to evaluate the essays that students wrote. Individual readers could read and evaluate essays quickly, but no individuals volunteered to read thousands of papers! And when Paul Diederich, in a landmark study, Measuring Growth in English, gave a group of readers papers to evaluate without including any criteria, he discovered that every paper received every possible score from the group of readers. The problem, White explains, "was to develop a method of scoring papers that retained the economy of a single, general impression score, with its underlying view that writing could be evaluated as a whole, and added to it substantial reliability of scoring."
Several key elements have been developed to ensure reliability in holistic scoring: a clearly defined writing assignment or prompt, highly structured reading sessions, written scoring "rubrics" or guides, sample papers (or "anchors") matched with descriptions on the scoring guides, close monitoring of the scoring while it is going on, multiple readers (and thus scores), and careful record keeping to ensure the continuing reliability of participating readers. Good holistic writing assessment programs -- ones that are cost-effective and produce valid, reliable results -- should be designed with careful attention to these elements.
Diederich, P. Measuring Growth in English. Urbana, Ill.: National Council of Teachers of English, 1974. White, E. M. Teaching and Assessing Writing. San Francisco: Jossey-Bass, 1988.