Paper accepted at the READI workshop at LREC-COLING 2024: "Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models"
- We developed a new method for evaluating the quality of multiple-choice reading comprehension test items.
- The method works for human evaluation and automatic evaluation with large language models.
- We used the method to evaluate items generated by Llama 2 and GPT-4 in a zero-shot setting.
- The results showed that the method is effective and that the quality of generated items is still limited, especially for Llama 2.
The paper was accepted for a poster presentation at the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) at LREC-COLING in Turin (Italy) on May 20, 2024.
Read the paper here.