Page 266 - Socially Intelligent Agents Creating Relationships with Computers and Robots
P. 266

Multi-Agent Contract Negotiation                                 249


                                  800
                                      1 local hillclimber
                                  700
                                  600
                                                                             Annealer  Hillclimber
                                 Average Utility  400  2 local hillclimbers  Hillclimber  550/550  180/700
                                  500
                                                                   Annealer
                                               2 local simulated annealers
                                                                                        400/400
                                                                              700/180
                                  300
                                  200
                                             1 local simulated annealer
                                  100
                                   0
                                    0  100  200  300  400  500  600  700  800  900  1000
                                              Proposals
                                   Figure 30.2.  Game Dynamics (left) and Final Payoff Matrix of the Game (right)


                              dependently of the cost-benefit margins. Therefore, at high virtual temperature
                              the simulated annealer is more explorative and “far sighted” because it assumes
                              costs now are offset by gains later. This is in contrast to the myopic nature of the
                              hillclimber where exploration is constrained by the monotonicity requirement.
                              In the asymmetric interaction the cooperation of annealers permits more explo-
                              ration of the contract space, and hence arrival to higher optima, of hillclimber’s
                              utility landscape. However, this cooperation is not reciprocated by hillclimbers
                              who act selfishly. Therefore, gains of hillclimbers are achieved at the cost of
                              the annealer. The right figure in figure 30.2 represents the underlying game as a
                              matrix of final observed utilities for all the pairings of hillclimber and annealer
                              strategies. The results confirm that this game is an instance of the prisoner’s
                              dilemma game [1], where for each agent the dominant strategy is hillclimb-
                              ing. Therefore, the unique dominating strategy is for both agents to hillclimb.
                              However, this unique dominating strategy is pareto-optimally dominated when
                              both are annealers. In other words, the single Nash equilibria of this game (two
                              hillclimbers) is the only solution not in the Pareto set.
                              4.     Conclusions

                                The contracting problem was used to motivate two different heuristic and
                              approximateagent decisionmodels, bothbasedona realistic set of requirements
                              over both K and C. However, the cost of these requirements is the sub-optimality
                              of Q. This trade-off was demonstrated in both models by negotiation strategies
                              selecting outcomes that are not pareto efficient. However, imperfections is a
                              common feature of the world and real social systems have established personal
                              and institutional mechanisms for dealing with such imperfections. Similarly,
                              in future computational models are sought that are incremental, repeated and
                              support feedback and error-correction. Learning and evolutionary techniques
   261   262   263   264   265   266   267   268   269   270   271