Putting AI to the test

How does the performance of GPT and 15-year-old students in PISA compare?

Advancements in artificial intelligence (AI) are laying the groundwork for extensive and rapid transformations in society. Understanding the relationship between AI capabilities and human skills is essential to ensure policy responsiveness to ongoing and incoming changes. The OECD has tracked how well AI systems fare on tasks from the Programme for International Student Assessment (PISA), comparing AI performance to that of 15-year-old students in the test’s core domains of reading, mathematics and science. Tests were conducted using the Generative Pre-Trained Transformer (GPT) family of large language models (LLMs), the AI behind ChatGPT, which took the world by storm after its public release in late 2022. Results show that both GPT versions outperform average student performance in reading and science. In addition, we observe rapid advances in mathematics where AI capabilities are quickly catching up with those of students. In November 2022, GPT-3.5 could answer 35% of a set of PISA mathematics tasks, a level of performance significantly below that of humans, who answer 51% of the tasks successfully on average. However, by March 2023, GPT-4 answered 40% of the tasks successfully. Policy implications of these results are discussed in this paper.

Published on July 15, 2023

In series:OECD Education Spotlightsview more titles