AI scoring for international large-scale assessments using a deep learning model and
multilingual data
Artificial Intelligence (AI) scoring for constructed-response items, using recent
advancements in multilingual, deep learning techniques utilising models pre-trained
with a massive multilingual text corpus, is examined using international large-scale
assessment data. Historical student responses to Reading and Science literacy cognitive
items developed under the PISA analytical framework are used as training data for
deep learning together with multilingual data to construct an AI model. The trained
AI models are then used to score and the results compared with human-scored data.
The score distributions estimated based on the AI-scored data and the human-scored
data are highly consistent with each other; furthermore, even item-level psychometric
properties of the majority of items showed high levels of agreement, although a few
items showed discrepancies. This study demonstrates a practical procedure for using
a multilingual data approach, and this new AI-scoring methodology reached a practical
level of quality, even in the context of an international large-scale assessment.
Published on February 21, 2023
In series:OECD Education Working Papersview more titles