Paper accepted at the READI workshop at LREC-COLING 2024: "Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models"

We developed a new method for evaluating the quality of multiple-choice reading comprehension test items.
The method works for human evaluation and automatic evaluation with large language models.
We used the method to evaluate items generated by Llama 2 and GPT-4 in a zero-shot setting.
The results showed that the method is effective and that the quality of generated items is still limited, especially for Llama 2.

The paper was accepted for a poster presentation at the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) at LREC-COLING in Turin (Italy) on May 20, 2024.

Read the paper here.