Page 123 - Composition in Convergence The Impact of the New Media on Writing Assessment
P. 123

90           CHAPTER 4

         rules and procedures. These rules and procedures construct a system
         of evaluation that permits ordered observations, inferences, general-
        izations, predictions, and  analyses.  Yet, as Messick and  educational
        researcher  M.  T.  Kane  have  indicated,  the  resultant  scores  that
        emerge from these evaluation  systems supposedly connect to "rele-
        vant content  and operative processes" that are "presumed to be re-
        flected  in scores that concatenate  responses in  domain-appropriate
        ways  and are generalizable across a range of tasks, settings  and oc-
        casions" (Nachmias & Nachmias, 1981, p. 145). However, what hap-
        pens far  more often  is that the  interpretations  and  actions derived
        from  the  scores are  "typically  extrapolated  beyond the test  context
        on the basis of documented or presumed relationships with  nontest
        behaviors  and anticipated outcomes  or consequences" (Nachmias &
        Nachmias,   1981,  p. 146).
           As with most  things,  in practice,  ideas such  as validity  and  reli-
        ability  are more complex than they  are simple—particularly when
        the concepts are applied to something with as many variables and is-
        sues as writing.  Writing  specialists need to understand  that an  as-
        sessment  tool  has  to  be evaluated  against  other  characteristics  to
        conclude  its worth  as a  measurement  instrument. Assessment  in-
        struments  are only useful when they are both reliable and valid, and
        far  too often in something  like the evaluation of real writing outside
        of highly  constrained  test  conditions, the  chance for attaining  solid
        confidence  levels for validity  and  reliability is nearly impossible.
           Now,  I'll concede that  some assessment proponents  might  differ
        with my observations.  These individuals will argue that holistic es-
        say  scoring  and portfolio  reading  have  depended on  behaviorism's
        recognition of validity  and reliability for decades to offer credibility.
        If  real validity  and  reliability  exist in these situations,  though,  it is
        usually because the students'  test conditions provide a veneer on the
        process  that  tricks  teachers,  departments,  and  institutions  into
        thinking  and believing that their  exam  is reliable and valid. Let me
        argue here that most writing  assessment instruments  are unreliable
        for  several  reasons,  from  students misunderstanding  the prompt's
        wording   or  its  expectations  to  instability  in  students'  responses
        (which could be a sign of growth or cheating instead of an error, but
        only in clearcut cases is one ever quite sure) to a lack of internal  con-
        sistency  to  a problem with  intercoder reliability  (commonly called
        "splits" in holistic readings). In the language of tests and  measure-
        ments, these lapses are called variable errors because the "error varies
   118   119   120   121   122   123   124   125   126   127   128