Paper accepted at the BEA workshop at ACL 2025: "Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?"

We present a method for evaluating how human-like the responses of LLMs in educational tests are.
The method uses classical test theory (CTT) and item response theory (IRT) as psychometric frameworks.
None of the LLMs we tested produced sufficiently human-like responses that could be used in pilot studies for test development.

We will present our submission at the 20th Workshop on Innovative Use of NLP for Building Educational Applications co-located with ACL 2025 in Vienna (Austria) on July 31 / August 1, 2025.

Read the paper here.