Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls
We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential imperfect information form. These algorithms update the policy of each player with a mirror-descent step to minimize the loss of playing against an improved opponent. We establish a convergence result to the set of Nash equilibria where the speed of convergence depends on the amount of improvement of the opponent policies. In addition, if the improved opponent is a best response, then an exponential convergence rate is achieved.