Page 123 - Composition in Convergence The Impact of the New Media on Writing Assessment
P. 123
90 CHAPTER 4
rules and procedures. These rules and procedures construct a system
of evaluation that permits ordered observations, inferences, general-
izations, predictions, and analyses. Yet, as Messick and educational
researcher M. T. Kane have indicated, the resultant scores that
emerge from these evaluation systems supposedly connect to "rele-
vant content and operative processes" that are "presumed to be re-
flected in scores that concatenate responses in domain-appropriate
ways and are generalizable across a range of tasks, settings and oc-
casions" (Nachmias & Nachmias, 1981, p. 145). However, what hap-
pens far more often is that the interpretations and actions derived
from the scores are "typically extrapolated beyond the test context
on the basis of documented or presumed relationships with nontest
behaviors and anticipated outcomes or consequences" (Nachmias &
Nachmias, 1981, p. 146).
As with most things, in practice, ideas such as validity and reli-
ability are more complex than they are simple—particularly when
the concepts are applied to something with as many variables and is-
sues as writing. Writing specialists need to understand that an as-
sessment tool has to be evaluated against other characteristics to
conclude its worth as a measurement instrument. Assessment in-
struments are only useful when they are both reliable and valid, and
far too often in something like the evaluation of real writing outside
of highly constrained test conditions, the chance for attaining solid
confidence levels for validity and reliability is nearly impossible.
Now, I'll concede that some assessment proponents might differ
with my observations. These individuals will argue that holistic es-
say scoring and portfolio reading have depended on behaviorism's
recognition of validity and reliability for decades to offer credibility.
If real validity and reliability exist in these situations, though, it is
usually because the students' test conditions provide a veneer on the
process that tricks teachers, departments, and institutions into
thinking and believing that their exam is reliable and valid. Let me
argue here that most writing assessment instruments are unreliable
for several reasons, from students misunderstanding the prompt's
wording or its expectations to instability in students' responses
(which could be a sign of growth or cheating instead of an error, but
only in clearcut cases is one ever quite sure) to a lack of internal con-
sistency to a problem with intercoder reliability (commonly called
"splits" in holistic readings). In the language of tests and measure-
ments, these lapses are called variable errors because the "error varies