Game-theory behaviour of large language models: The case of Keynesian beauty contests
DOI:
https://doi.org/10.18559/ebr.2025.2.2182Keywords:
economic games, large language models, strategic interactionsAbstract
The growing adoption of large language models (LLMs) presents potential for deeper understanding of human behaviours within game theory frameworks. This paper examines strategic interactions among multiple types of LLM-based agents in a classical beauty contest game. LLM-based agents demonstrate varying depth of reasoning that fall within a range of level-0 to 1, which are lower than experimental results conducted with human subjects in previous literature, but they display similar convergence pattern towards Nash Equilibrium choice in repeated setting. Through simulations that varies the group composition of agent types, I found that environment with lower strategic uncertainty enhances convergence for LLM-based agents, and environments with mixed strategic types accelerate convergence for all. Results with simulated agents not only convey insights on potential human behaviours in competitive setting, they also offer valuable understanding of strategic interactions among algorithms.
JEL Classification
Computational Techniques • Simulation Modeling (C63)
General (C70)
General (C90)
Downloads
References
Aher, G. V., Arriaga, R. I., & Kalai, A. T. (2023). Using large language models to simulate multiple humans and replicate human subject studies. International Conference on Machine Learning, 337–371. https://proceedings.mlr.press/v202/aher23a.html
View in Google Scholar
Akata, E., Schulz, L., Coda-Forno, J., Oh, S. J., Bethge, M., & Schulz, E. (2023). Playing repeated games with large language models. arXiv preprint arXiv:2305.16867.
View in Google Scholar
Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351.
View in Google Scholar
DOI: https://doi.org/10.1017/pan.2023.2
Bauer, K., Liebich, L., Hinz, O., & Kosfeld, M. (2023). Decoding gpt’s hidden ‘rationality’of cooperation.
View in Google Scholar
DOI: https://doi.org/10.2139/ssrn.4576036
Bosch-Domenech, A., Montalvo, J. G., Nagel, R., & Satorra, A. (2002). One, two,(three), infinity,. . . : Newspaper and lab beauty-contest experiments. American Economic Review, 92(5), 1687–1701.
View in Google Scholar
DOI: https://doi.org/10.1257/000282802762024737
Brown, Z. Y., & MacKay, A. (2023). Competition in pricing algorithms. American Economic Journal: Microeconomics, 15(2), 109–156.
View in Google Scholar
DOI: https://doi.org/10.1257/mic.20210158
Camerer, C. F., Ho, T. - H., & Chong, J.- K. (2004). A cognitive hierarchy model of games. The Quarterly Journal of Economics, 119(3), 861–898.
View in Google Scholar
DOI: https://doi.org/10.1162/0033553041502225
Chen, L., Mislove, A., & Wilson, C. (2016). An empirical analysis of algorithmic pricing on amazon marketplace. Proceedings of the 25th international conference on World Wide Web, 1339–1349.
View in Google Scholar
DOI: https://doi.org/10.1145/2872427.2883089
Chen, Y., Liu, T. X., Shan, Y., & Zhong, S. (2023). The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences, 120(51), e2316205120.
View in Google Scholar
DOI: https://doi.org/10.1073/pnas.2316205120
Coricelli, G., & Nagel, R. (2009). Neural correlates of depth of strategic reasoning in medial prefrontal cortex. Proceedings of the National Academy of Sciences, 106(23), 9163–9168.
View in Google Scholar
DOI: https://doi.org/10.1073/pnas.0807721106
Costa-Gomes, M. A., & Weizsäcker, G. (2008). Stated beliefs and play in normal-form games. The Review of Economic Studies, 75(3), 729–762.
View in Google Scholar
DOI: https://doi.org/10.1111/j.1467-937X.2008.00498.x
Devetag, G., Di Guida, S., & Polonio, L. (2016). An eye-tracking study of feature-based choice in one-shot games. Experimental Economics, 19, 177–201.
View in Google Scholar
DOI: https://doi.org/10.1007/s10683-015-9432-5
Dillion, D., Tandon, N., Gu, Y., & Gray, K. (2023). Can ai language models replace human participants? Trends in Cognitive Sciences.
View in Google Scholar
DOI: https://doi.org/10.1016/j.tics.2023.04.008
Fan, C., Chen, J., Jin, Y., & He, H. (2023). Can large language models serve as rational players in game theory? a systematic analysis. arXiv preprint arXiv:2312.05488.
View in Google Scholar
DOI: https://doi.org/10.1609/aaai.v38i16.29751
Guo, F. (2023). Gpt in game theory experiments. arXiv:2305.05516.
View in Google Scholar
Guo, S., Bu, H., Wang, H., Ren, Y., Sui, D., Shang, Y., & Lu, S. (2024). Economics arena for large language models. arXiv preprint arXiv:2401.01735.
View in Google Scholar
Hamill, L. and Gilbert, N. (2015). Agent-based modelling in economics. John Wiley & Sons.
View in Google Scholar
DOI: https://doi.org/10.1002/9781118945520
Horton, J. J. (2023). Large language models as simulated economic agents: What can we learn from homo silicus? (Tech. rep.). National Bureau of Economic Research.
View in Google Scholar
DOI: https://doi.org/10.3386/w31122
HuggingFace. (2022). Illustrating reinforcement learning from human feedback (rlhf). Retrieved January 31, 2024, from https://huggingface.co/blog/rlhf
View in Google Scholar
Huijzer, R., & Hill, Y. (2023, January). Large language models show human behavior.
View in Google Scholar
DOI: https://doi.org/10.31234/osf.io/munc9
Ireson, J., & Hallam, S. (1999). Raising standards: Is ability grouping the answer? Oxford review of education, 25(3), 343–358.
View in Google Scholar
DOI: https://doi.org/10.1080/030549899104026
Kalton, G., & Schuman, H. (1982). The effect of the question on survey responses: A review. Journal of the Royal Statistical Society Series A: Statistics in Society, 145(1), 42–57.
View in Google Scholar
DOI: https://doi.org/10.2307/2981421
Keynes, J. M. (1936). The general theory of interest, employment and money. Macmillan and Co, Limited.
View in Google Scholar
Kosinski, M. (2023). Theory of mind may have spontaneously emerged in large language models. arXiv preprint arXiv:2302.02083.
View in Google Scholar
Liem, G. A. D., Marsh, H. W., Martin, A. J., McInerney, D. M., & Yeung, A. S. (2013). The big-fish-little-pond effect and a national policy of within-school ability streaming: Alternative frames of reference. American Educational Research Journal, 50(2), 326–370.
View in Google Scholar
DOI: https://doi.org/10.3102/0002831212464511
Mauersberger, F., & Nagel, R. (2018). Levels of reasoning in keynesian beauty contests: A generative framework. In Handbook of computational economics (pp. 541–634, Vol. 4). Elsevier.
View in Google Scholar
DOI: https://doi.org/10.1016/bs.hescom.2018.05.002
Mei, Q., Xie, Y., Yuan, W., & Jackson, M. O. (2024). A turing test of whether ai chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences, 121(9), e2313925121.
View in Google Scholar
DOI: https://doi.org/10.1073/pnas.2313925121
Nagel, R. (1995). Unraveling in guessing games: An experimental study. The American economic review, 85(5), 1313–1326. https://www.jstor.org/stable/2950991
View in Google Scholar
Nagel, R., Bühren, C., & Frank, B. (2017). Inspired and inspiring: Hervé moulin and the discovery of the beauty contest game. Mathematical Social Sciences, 90, 191–207.
View in Google Scholar
DOI: https://doi.org/10.1016/j.mathsocsci.2016.09.001
OpenAI. (2024). How chatgpt and our language models are developed. Retrieved January 18, 2024, from https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed
View in Google Scholar
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
View in Google Scholar
Phelps, S., & Russell, Y. I. (2023). Investigating emergent goal-like behaviour in large language models using experimental economics. arXiv preprint arXiv:2305.07970.
View in Google Scholar
Sclar, M., Choi, Y., Tsvetkov, Y., & Suhr, A. (2023). Quantifying language models’ sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. arXiv preprint arXiv:2310.11324.
View in Google Scholar
Strachan, James WA and Albergo, Dalila and Borghini, Giulia and Pansardi, Oriana and Scaliti, Eugenio and Gupta, Saurabh and Saxena, Krati and Rufo, Alessandro and Panzeri, Stefano and Manzi, Guido., et al. (2024). Testing theory of mind in large language models and humans. Nature Human Behaviour, 8(7), 1285-1295.
View in Google Scholar
DOI: https://doi.org/10.1038/s41562-024-01882-z
Trality. (2024). Crypto trading bots: The ultimate beginner’s guide. Retrieved January 23, 2024, from https://www.trality.com/blog/crypto-trading-bots
View in Google Scholar
Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. science, 211(4481), 453–458.
View in Google Scholar
DOI: https://doi.org/10.1126/science.7455683
Webb, T., Holyoak, K. J., & Lu, H. (2023). Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9), 1526–1541.
View in Google Scholar
DOI: https://doi.org/10.1038/s41562-023-01659-w
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Siting Estee Lu

This work is licensed under a Creative Commons Attribution 4.0 International License.
