research-article

Leveling the playing field: fairness in AI versus human game benchmarks

Authors:
Rodrigo Canaan

New York University

New York University
View Profile

,
Christoph Salge

University of Hertfordshire, Hatfield, UK

University of Hertfordshire, Hatfield, UK
View Profile

,
Julian Togelius

New York University

New York University
View Profile

,
Andy Nealen

University of Southern California

University of Southern California
View Profile

FDG '19: Proceedings of the 14th International Conference on the Foundations of Digital GamesAugust 2019Article No.: 37Pages 1–8https://doi.org/10.1145/3337722.3337750

Published:26 August 2019Publication History

FDG '19: Proceedings of the 14th International Conference on the Foundations of Digital Games

Pages 1–8

ABSTRACT

From the beginning of the history of AI, there has been interest in games as a platform of research. As the field developed, human-level competence in complex games became a target researchers worked to reach. Only relatively recently has this target been finally met for traditional tabletop games such as Backgammon, Chess and Go. This prompted a shift in research focus towards electronic games, which provide unique new challenges. As is often the case with AI research, these results are liable to be exaggerated or mis-represented by either authors or third parties. The extent to which these game benchmarks constitute "fair" competition between human and AI is also a matter of debate. In this paper, we review statements made by reseachers and third parties in the general media and academic publications about these game benchmark results. We analyze what a fair competition would look like and suggest a taxonomy of dimensions to frame the debate of fairness in game contests between humans and machines. Eventually, we argue that there is no completely fair way to compare human and AI performance on a game.

References

Thomas Anthony, Zheng Tian, and David Barber. 2017. Thinking fast and slow with deep learning and tree search. In Advances in Neural Information Processing Systems. 5360--5370. Google ScholarDigital Library
Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47 (2013), 253--279. Google ScholarCross Ref
Rodney Brooks. 2017. The Seven Deadly Sins of Predicting the Future of AI. Retrieved October 23, 2018 from https://rodneybrooks.com/the-seven-deadly-sins-of-predicting-the-future-of-ai/Google Scholar
Sarah F Brosnan and Frans BM De Waal. 2003. Monkeys reject unequal pay. Nature 425, 6955 (2003), 297.Google Scholar
Dustin Browder. 2010. StarCraft II: Wings of Liberty.Google Scholar
Michael Buro. 2003. Real-time strategy games: A new AI research challenge. In IJCAI, Vol. 2003. 1534--1535. Google ScholarDigital Library
Murray Campbell, A Joseph Hoane Jr, and Feng-hsiung Hsu. 2002. Deep blue. Artificial intelligence 134, 1-2 (2002), 57--83. Google ScholarDigital Library
Andy Clark and David Chalmers. 1998. The extended mind. analysis 58, 1 (1998), 7--19.Google Scholar
Mike Cook. 2018. OpenAI Dota 2: Game Is Hard. Retrieved October 23, 2018 from http://www.gamesbyangelina.org/2018/08/openai-dota-2-game-is-hard/Google Scholar
DeepMind. 2019. AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. Retrieved April 23, 2019 from https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/Google Scholar
Nathan Ensmenger. 2012. Is chess the drosophila of artificial intelligence? A social history of an algorithm. Social Studies of Science 42, 1 (2012), 5--30.Google ScholarCross Ref
Jonathan St BT Evans. 1984. Heuristic and analytic processes in reasoning. British Journal of Psychology 75, 4 (1984), 451--468.Google ScholarCross Ref
Michael Genesereth, Nathaniel Love, and Barney Pell. 2005. General game playing: Overview of the AAAI competition. AI magazine 26, 2 (2005), 62.Google Scholar
Philip Hingston. 2009. A turing test for computer game bots. IEEE Transactions on Computational Intelligence and AI in Games 1, 3 (2009), 169--186.Google ScholarCross Ref
Haomiao Huang. 2019. AlphaStar's Strategies Might Be Bad for Star-craft 2 But They're Great for AI. Retrieved May 03, 2019 from https://medium.com/datadriveninvestor/alphastars-strategies-might-be-bad-for-starcraft-2-but-they-re-great-for-ai-c0a879564da22Google Scholar
IceFrog. 2013. Dota2.Google Scholar
Daniel Kahneman, Jack L Knetsch, and Richard H Thaler. 1986. Fairness and the assumptions of economics. Journal of business (1986), S285--S300.Google Scholar
Sergey Karakovskiy and Julian Togelius. 2012. The mario ai benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games 4, 1 (2012), 55--67.Google ScholarCross Ref
Michal Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaśkowski. 2016. Vizdoom: A doom-based ai research platform for visual reinforcement learning. In Computational Intelligence and Games (CIG), 2016 IEEE Conference on. IEEE, 1--8.Google ScholarDigital Library
Ahmed Khalifa, Aaron Isaksen, Julian Togelius, and Andy Nealen. 2016. Modifying MCTS for Human-Like General Video Game Playing.. In IJCAI. 2514--2520. Google ScholarDigital Library
Team Liquid. 2018. OpenAI's Dota 2 bots vs. 5 top professionals in TI. Retrieved October 23, 2018 from https://www.liquiddota.com/forum/dota-2-general/534977-openais-dota-2--bots-vs-5-top-professionals-in-tiGoogle Scholar
Marlos C Machado, Marc G Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, and Michael Bowling. 2017. Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. arXiv preprint arXiv:1709.06009 (2017).Google Scholar
Gary Marcus. 2018. Innateness, AlphaZero, and Artificial Intelligence. arXiv preprint arXiv:1801.05667 (2018).Google Scholar
Chris Metzen and Rob Pardo. 1998. StarCraft: Brood War.Google Scholar
Motherboard. 2018. OpenAI Is Beating Humans at 'Dota 2' Because It's Basically Cheating. Retrieved October 23, 2018 from https://motherboard.vice.com/enus/article/gy3nvq/ai-beat-humans-at-dota-2Google Scholar
Santiago Ontanón, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, and Mike Preuss. 2013. A survey of real-time strategy game AI research and competition in StarCraft. IEEE Transactions on Computational Intelligence and AI in games 5, 4 (2013), 293--311.Google ScholarCross Ref
OpenAI.2017. Dota2. Retrieved October 23, 2018 from https://blog.openai.com/dota-2/Google Scholar
OpenAI.2018. The International 2018: Results. Retrieved October 23, 2018 from https://blog.openai.com/the-international-2018-results/Google Scholar
OpenAI. 2018. OpenAI Five. Retrieved October 23, 2018 from https://blog.openai.com/openai-five/Google Scholar
OpenAI. 2018. OpenAI Five Benchmark: Results. Retrieved October 23, 2018 from https://blog.openai.com/openai-five-benchmark-results/Google Scholar
OpenAI. 2019. How to Train Your OpenAI Five. Retrieved May 03, 2019 from https://openai.com/blog/how-to-train-your-openai-five/Google Scholar
OpenAI. 2019. OpenAI Five Arena. Retrieved May 03, 2019 from https://arena.openai.com/#/Google Scholar
Diego Perez-Liebana, Spyridon Samothrakis, Julian Togelius, Simon M Lucas, and Tom Schaul. 2016. General video game ai: Competition, challenges and opportunities. In Thirtieth AAAI Conference on Artificial Intelligence. 4335--4337. Google ScholarDigital Library
Washington Post. 2016. What AlphaGo's sly move says about machine creativity. Retrieved October 23, 2018 from https://www.washingtonpost.com/news/innovations/wp/2016/03/15/what-alphagos-sly-move-says-about-machine-creativity/?utmterm=.543bc9ade906Google Scholar
John Rawls. 2001. Justice as fairness: A restatement. Harvard University Press.Google Scholar
Reddit. 2018. Team Human vs. OpenAI Five Match Discussions. Retrieved October 23, 2018 from https://www.reddit.com/r/DotA2/comments/94udao/teamhumanvsopenaifivematchdiscussions/Google Scholar
MIT Technology Review. 2017. Humans Are Still Better Than AI at StarCraft--for Now. Retrieved October 23, 2018 from https://www.technologyreview.com/s/609242/humans-are-still-better-than-ai-at-starcraftfor-now/Google Scholar
Christoph Salge, Michael Cerny Green, Rodgrigo Canaan, and Julian Togelius. 2018. Generative design in minecraft (GDMC): settlement generation competition. In Proceedings of the 13th International Conference on the Foundations of Digital Games. ACM, 49. Google ScholarDigital Library
Tom Schaul, Julian Togelius, and Jürgen Schmidhuber. 2011. Measuring intelligence through games. arXiv preprint arXiv:1109.1314 (2011).Google Scholar
Oscar Scwartz. 2018. 'The discourse is unhinged': how the media gets AI alarmingly wrong. Retrieved October 23, 2018 from https://www.theguardian.com/technology/2018/jul/25/ai-artificial-intelligence-social-media-bots-wrongGoogle Scholar
John Searle. 1999. The Chinese Room. (1999).Google Scholar
Noor Shaker, Julian Togelius, Georgios N Yannakakis, Ben Weber, Tomoyuki Shimizu, Tomonori Hashiyama, Nathan Sorenson, Philippe Pasquier, Peter Mawhorter, Glen Takahashi, et al. 2011. The 2010 Mario AI championship: Level generation track. IEEE Transactions on Computational Intelligence and AI in Games 3, 4 (2011), 332--347.Google ScholarCross Ref
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484.Google Scholar
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017).Google Scholar
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of Go without human knowledge. Nature 550, 7676 (2017), 354.Google Scholar
Peter Stone, Michael Quinlan, and Todd Hester. 2010. The Essence of Soccer, Can Robots Play Too? In Soccer and Philosophy: Beautiful Thoughts on the Beautiful Game, Ted Richards (Ed.). Popular Culture and Philosophy, Vol. 51. Open Court Publishing Company, 75--88.Google Scholar
Hans Strasburger, Ingo Rentschler, and Martin Jüttner. 2011. Peripheral vision and pattern recognition: A review. Journal of vision 11, 5 (2011), 13--13.Google ScholarCross Ref
Gerald Tesauro. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38, 3 (1995), 58--68. Google ScholarDigital Library
The New York Times. 1997. Deep, Deeper, Deepest Blue. Retrieved October 23, 2018 from https://www.nytimes.com/1997/05/18/weekinreview/deep-deeper-deepest-blue.htmlGoogle Scholar
The New York Times.1997. Swift and Slashing, Computer Topples Kasparov. Retrieved October 23, 2018 from https://www.nytimes.com/1997/05/12/nyregion/swift-and-slashing-computer-topples-kasparov.htmlGoogle Scholar
The New York Times. 2017. Google's A.I. Program Rattles Chinese Go Master as It Wins Match. Retrieved October 23, 2018 from https://www.nytimes.com/2017/05/25/business/google-alphago-defeats-go-ke-jie-again.htmlGoogle Scholar
Vernor Vinge. 1993. Technological singularity. In VISION-21 Symposium sponsored by NASA Lewis Research Center and the Ohio Aerospace Institute. 30--31.Google Scholar
Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, et al. 2017. Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782 (2017).Google Scholar
WeeklyStandard. 1997. Be Afraid. Retrieved October 23, 2018 from https://www.weeklystandard.com/charles-krauthammer/be-afraid-9802Google Scholar
Wired. 2016. IN TWO MOVES, ALPHAGO AND LEE SEDOL REDEFINED THE FUTURE. Retrieved October 23, 2018 from https://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/Google Scholar
WorldAIShow. 2018. Why is Elon Musk afraid of AlphaGo-Zero? Retrieved October 23, 2018 from https://singapore.worldaishow.com/elon-musk-afraid-alphago-zero-ai/Google Scholar
Georgios N. Yannakakis and Julian Togelius. 2018. Artificial Intelligence and Games. Springer. http://gameaibook.org. Google ScholarDigital Library

Index Terms

Leveling the playing field: fairness in AI versus human game benchmarks
1. Applied computing
  1. Computers in other domains
    1. Personal computers and PC applications
      1. Computer games
2. Computing methodologies
  1. Artificial intelligence
    1. Philosophical/theoretical foundations of artificial intelligence

Recommendations

Levelling the playing field: games handicapping
MIV'06: Proceedings of the 6th WSEAS International Conference on Multimedia, Internet & Video Technologies

Imagine the look on your ten year old sons face having just been totally annihilated by an over zealous big brother/father combination in Halo-2. Ignoring the fact that he may be scarred for life by this experience, his whole attitude towards the ...
Read More
Evolving Intelligent Mario Controller by Reinforcement Learning
TAAI '11: Proceedings of the 2011 International Conference on Technologies and Applications of Artificial Intelligence

Artificial Intelligence for computer games is an interesting topic which attracts intensive attention recently. In this context, Mario AI Competition modifies a Super Mario Bros game to be a benchmark software for people who program AI controller to ...
Read More
An analysis of play style of advanced mahjong players toward the implementation of strong AI player
Special Issue: Artificial, Biological and Bio-Inspired Intelligence Guest Editors: Tomohiro Shirakawa and Hiroshi Sato

The studies of artificial intelligence AI on the game with perfect information has been very much advanced to have an ability to compete top-rate human players. In contrast, it is still difficult for AI to seek the best strategy of the facing situation ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FDG '19: Proceedings of the 14th International Conference on the Foundations of Digital Games
August 2019
822 pages
ISBN:9781450372176
DOI:10.1145/3337722
Conference Chair:
Sebastian Deterding
University of York
,
General Chair:
Foaad Khosmood
California Polytechnic State University
,
Program Chairs:
Johanna Pirker
Graz University of Technology
,
Thomas Apperley
CEGCS Tampere University
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
AI benchmarks
fairness
game AI
games
Qualifiers
- research-article
Conference

Acceptance Rates
FDG '19 Paper Acceptance Rate46of124submissions,37%Overall Acceptance Rate152of415submissions,37%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 518
  Total Downloads
- Downloads (Last 12 months)82
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Leveling the playing field: fairness in AI versus human game benchmarks

FDG '19: Proceedings of the 14th International Conference on the Foundations of Digital Games

ABSTRACT

References

Cited By

Index Terms

Recommendations

Levelling the playing field: games handicapping

Evolving Intelligent Mario Controller by Reinforcement Learning

An analysis of play style of advanced mahjong players toward the implementation of strong AI player