Page 87 - Foundations of Cognitive Psychology : Core Readings
P. 87

86   Jay L. McClelland, David E.Rumelhart, and Geoffrey E.Hinton

                spontaneous generalization, extending behavior appropriate for one pattern to
                other similar patterns.This property is shared by other PDP models, such as
                thewordperceptionmodel andthe Jets andSharksmodel described above;the
                main difference here is in the existence of simple, local, learning mechanisms
                that can allow the acquisition of the connection strengths needed to produce
                these generalizations through experience with members of the ensemble of
                patterns.Distributed models have another interesting property as well: If
                there are regularities in the correspondences between pairs of patterns, the
                model will naturally extract these regularities.This property allows distributed
                models to acquire patterns of interconnections that lead them to behave in ways
                we ordinarily take as evidence for the use of linguistic rules.
                  We describe one such model very briefly.The model is a mechanism that
                learns how to construct the past tenses of words from their root forms through
                repeated presentations of examples of root forms paired with the correspond-
                ing past-tense form.The model consists of two pools of units. In one pool, pat-
                terns of activation representing the phonological structure of the root form of
                the verb can be represented, and, in the other, patterns representing the pho-
                nological structure of the past tense can be represented.The goal of the model
                is simply to learn the right connection strengths between the root units and the
                past-tense units, so that whenever the root form of a verb is presented the
                model will construct the corresponding past-tense form.The model is trained
                by presenting the root form of the verb as a pattern of activation over the root
                units, and then using a simple, local, learning rule to adjust the connection
                strengths so that this root form will tend to produce the correct pattern of acti-
                vation over the past-tense units.The model is tested by simply presenting the
                root form as a pattern of activation over the root units and examining the pat-
                tern of activation produced over the past-tense units.
                  The model is trained initially with a small number of verbs children learn
                early in the acquisition process.At this point in learning, it can only produce
                appropriateoutputs for inputsthatithas explicitly been shown.Butasitlearns
                more and more verbs, it exhibits two interesting behaviors.First, it produces
                the standard ed past tense when tested with pseudo-verbs or verbs it has never
                seen.Second, it ‘‘overregularizes’’ the past tense of irregular words it pre-
                viously completed correctly.Often, the model will blend the irregular past
                tense of the word with the regular ed ending, and produce errors like CAMED
                as the past of COME.These phenomena mirror those observed in the early
                phases of acquisition of control over past tenses in young children.
                  The generativity of the child’s responses—the creation of regular past tenses
                of new verbs and the overregularization of the irregular verbs—has been taken
                as strong evidence that the child has induced the rule which states that the
                regular correspondence for the past tense in English is to add a final ed (Berko,
                1958).On the evidence of its performance, then, the model can be said to have
                acquired the rule.However, no special rule-induction mechanism is used, and
                no special language-acquisition device is required.The model learns to behave
                in accordance with therule, notbyexplicitlynotingthatmostwords take ed
                in the past tense in English and storing this rule away explicitly, but simply
                by building up a set of connections in a pattern associator through a long series
                of simple learning experiences.The same mechanisms of parallel distributed
   82   83   84   85   86   87   88   89   90   91   92