Game-theory behaviour of large language models: The case of Keynesian beauty contests

Authors

DOI:

https://doi.org/10.18559/ebr.2025.2.2182

Keywords:

economic games, large language models, strategic interactions

Abstract

The growing adoption of large language models (LLMs) presents potential for deeper understanding of human behaviours within game theory frameworks. This paper examines strategic interactions among multiple types of LLM-based agents in a classical beauty contest game. LLM-based agents demonstrate varying depth of reasoning that fall within a range of level-0 to 1, which are lower than experimental results conducted with human subjects in previous literature, but they display similar convergence pattern towards Nash Equilibrium choice in repeated setting. Through simulations that varies the group composition of agent types, I found that environment with lower strategic uncertainty enhances convergence for LLM-based agents, and environments with mixed strategic types accelerate convergence for all. Results with simulated agents not only convey insights on potential human behaviours in competitive setting, they also offer valuable understanding of strategic interactions among algorithms.

JEL Classification

Computational Techniques • Simulation Modeling (C63)
General (C70)
General (C90)

Downloads

Download data is not yet available.

References

Aher, G. V., Arriaga, R. I., & Kalai, A. T. (2023). Using large language models to simulate multiple humans and replicate human subject studies. International Conference on Machine Learning, 337–371. https://proceedings.mlr.press/v202/aher23a.html
View in Google Scholar

Akata, E., Schulz, L., Coda-Forno, J., Oh, S. J., Bethge, M., & Schulz, E. (2023). Playing repeated games with large language models. arXiv preprint arXiv:2305.16867.
View in Google Scholar

Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351.
View in Google Scholar DOI: https://doi.org/10.1017/pan.2023.2

Bauer, K., Liebich, L., Hinz, O., & Kosfeld, M. (2023). Decoding gpt’s hidden ‘rationality’of cooperation.
View in Google Scholar DOI: https://doi.org/10.2139/ssrn.4576036

Bosch-Domenech, A., Montalvo, J. G., Nagel, R., & Satorra, A. (2002). One, two,(three), infinity,. . . : Newspaper and lab beauty-contest experiments. American Economic Review, 92(5), 1687–1701.
View in Google Scholar DOI: https://doi.org/10.1257/000282802762024737

Brown, Z. Y., & MacKay, A. (2023). Competition in pricing algorithms. American Economic Journal: Microeconomics, 15(2), 109–156.
View in Google Scholar DOI: https://doi.org/10.1257/mic.20210158

Camerer, C. F., Ho, T. - H., & Chong, J.- K. (2004). A cognitive hierarchy model of games. The Quarterly Journal of Economics, 119(3), 861–898.
View in Google Scholar DOI: https://doi.org/10.1162/0033553041502225

Chen, L., Mislove, A., & Wilson, C. (2016). An empirical analysis of algorithmic pricing on amazon marketplace. Proceedings of the 25th international conference on World Wide Web, 1339–1349.
View in Google Scholar DOI: https://doi.org/10.1145/2872427.2883089

Chen, Y., Liu, T. X., Shan, Y., & Zhong, S. (2023). The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences, 120(51), e2316205120.
View in Google Scholar DOI: https://doi.org/10.1073/pnas.2316205120

Coricelli, G., & Nagel, R. (2009). Neural correlates of depth of strategic reasoning in medial prefrontal cortex. Proceedings of the National Academy of Sciences, 106(23), 9163–9168.
View in Google Scholar DOI: https://doi.org/10.1073/pnas.0807721106

Costa-Gomes, M. A., & Weizsäcker, G. (2008). Stated beliefs and play in normal-form games. The Review of Economic Studies, 75(3), 729–762.
View in Google Scholar DOI: https://doi.org/10.1111/j.1467-937X.2008.00498.x

Devetag, G., Di Guida, S., & Polonio, L. (2016). An eye-tracking study of feature-based choice in one-shot games. Experimental Economics, 19, 177–201.
View in Google Scholar DOI: https://doi.org/10.1007/s10683-015-9432-5

Dillion, D., Tandon, N., Gu, Y., & Gray, K. (2023). Can ai language models replace human participants? Trends in Cognitive Sciences.
View in Google Scholar DOI: https://doi.org/10.1016/j.tics.2023.04.008

Fan, C., Chen, J., Jin, Y., & He, H. (2023). Can large language models serve as rational players in game theory? a systematic analysis. arXiv preprint arXiv:2312.05488.
View in Google Scholar DOI: https://doi.org/10.1609/aaai.v38i16.29751

Guo, F. (2023). Gpt in game theory experiments. arXiv:2305.05516.
View in Google Scholar

Guo, S., Bu, H., Wang, H., Ren, Y., Sui, D., Shang, Y., & Lu, S. (2024). Economics arena for large language models. arXiv preprint arXiv:2401.01735.
View in Google Scholar

Hamill, L. and Gilbert, N. (2015). Agent-based modelling in economics. John Wiley & Sons.
View in Google Scholar DOI: https://doi.org/10.1002/9781118945520

Horton, J. J. (2023). Large language models as simulated economic agents: What can we learn from homo silicus? (Tech. rep.). National Bureau of Economic Research.
View in Google Scholar DOI: https://doi.org/10.3386/w31122

HuggingFace. (2022). Illustrating reinforcement learning from human feedback (rlhf). Retrieved January 31, 2024, from https://huggingface.co/blog/rlhf
View in Google Scholar

Huijzer, R., & Hill, Y. (2023, January). Large language models show human behavior.
View in Google Scholar DOI: https://doi.org/10.31234/osf.io/munc9

Ireson, J., & Hallam, S. (1999). Raising standards: Is ability grouping the answer? Oxford review of education, 25(3), 343–358.
View in Google Scholar DOI: https://doi.org/10.1080/030549899104026

Kalton, G., & Schuman, H. (1982). The effect of the question on survey responses: A review. Journal of the Royal Statistical Society Series A: Statistics in Society, 145(1), 42–57.
View in Google Scholar DOI: https://doi.org/10.2307/2981421

Keynes, J. M. (1936). The general theory of interest, employment and money. Macmillan and Co, Limited.
View in Google Scholar

Kosinski, M. (2023). Theory of mind may have spontaneously emerged in large language models. arXiv preprint arXiv:2302.02083.
View in Google Scholar

Liem, G. A. D., Marsh, H. W., Martin, A. J., McInerney, D. M., & Yeung, A. S. (2013). The big-fish-little-pond effect and a national policy of within-school ability streaming: Alternative frames of reference. American Educational Research Journal, 50(2), 326–370.
View in Google Scholar DOI: https://doi.org/10.3102/0002831212464511

Mauersberger, F., & Nagel, R. (2018). Levels of reasoning in keynesian beauty contests: A generative framework. In Handbook of computational economics (pp. 541–634, Vol. 4). Elsevier.
View in Google Scholar DOI: https://doi.org/10.1016/bs.hescom.2018.05.002

Mei, Q., Xie, Y., Yuan, W., & Jackson, M. O. (2024). A turing test of whether ai chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences, 121(9), e2313925121.
View in Google Scholar DOI: https://doi.org/10.1073/pnas.2313925121

Nagel, R. (1995). Unraveling in guessing games: An experimental study. The American economic review, 85(5), 1313–1326. https://www.jstor.org/stable/2950991
View in Google Scholar

Nagel, R., Bühren, C., & Frank, B. (2017). Inspired and inspiring: Hervé moulin and the discovery of the beauty contest game. Mathematical Social Sciences, 90, 191–207.
View in Google Scholar DOI: https://doi.org/10.1016/j.mathsocsci.2016.09.001

OpenAI. (2024). How chatgpt and our language models are developed. Retrieved January 18, 2024, from https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed
View in Google Scholar

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
View in Google Scholar

Phelps, S., & Russell, Y. I. (2023). Investigating emergent goal-like behaviour in large language models using experimental economics. arXiv preprint arXiv:2305.07970.
View in Google Scholar

Sclar, M., Choi, Y., Tsvetkov, Y., & Suhr, A. (2023). Quantifying language models’ sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. arXiv preprint arXiv:2310.11324.
View in Google Scholar

Strachan, James WA and Albergo, Dalila and Borghini, Giulia and Pansardi, Oriana and Scaliti, Eugenio and Gupta, Saurabh and Saxena, Krati and Rufo, Alessandro and Panzeri, Stefano and Manzi, Guido., et al. (2024). Testing theory of mind in large language models and humans. Nature Human Behaviour, 8(7), 1285-1295.
View in Google Scholar DOI: https://doi.org/10.1038/s41562-024-01882-z

Trality. (2024). Crypto trading bots: The ultimate beginner’s guide. Retrieved January 23, 2024, from https://www.trality.com/blog/crypto-trading-bots
View in Google Scholar

Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. science, 211(4481), 453–458.
View in Google Scholar DOI: https://doi.org/10.1126/science.7455683

Webb, T., Holyoak, K. J., & Lu, H. (2023). Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9), 1526–1541.
View in Google Scholar DOI: https://doi.org/10.1038/s41562-023-01659-w

Downloads

Published

2025-07-01

Issue

Section

Research article- regular issue

How to Cite

Lu, S. E. (2025). Game-theory behaviour of large language models: The case of Keynesian beauty contests. Economics and Business Review, 11(2), 119-148. https://doi.org/10.18559/ebr.2025.2.2182

Similar Articles

1-10 of 236

You may also start an advanced similarity search for this article.