Putting AI to the test
How does the performance of GPT and 15-year-old students in PISA compare?
Advancements in artificial intelligence (AI) are laying the groundwork for extensive
and rapid transformations in society. Understanding the relationship between AI capabilities
and human skills is essential to ensure policy responsiveness to ongoing and incoming
changes. The OECD has tracked how well AI systems fare on tasks from the Programme
for International Student Assessment (PISA), comparing AI performance to that of 15-year-old
students in the test’s core domains of reading, mathematics and science. Tests were
conducted using the Generative Pre-Trained Transformer (GPT) family of large language
models (LLMs), the AI behind ChatGPT, which took the world by storm after its public
release in late 2022.
Results show that both GPT versions outperform average student performance in reading
and science. In addition, we observe rapid advances in mathematics where AI capabilities
are quickly catching up with those of students. In November 2022, GPT-3.5 could answer
35% of a set of PISA mathematics tasks, a level of performance significantly below
that of humans, who answer 51% of the tasks successfully on average. However, by March
2023, GPT-4 answered 40% of the tasks successfully. Policy implications of these
results are discussed in this paper.
Published on July 15, 2023
In series:OECD Education Spotlightsview more titles