Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors


Loading...

Date

2024-11

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Large language models (LLMs) offer many opportunities to scale high-quality personalized tutoring. A promising approach is to build dialog tutoring models to scaffold students’ problem-solving. However, even though existing models perform well in solving reasoning questions, they can struggle to precisely detect student’s errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions and show how grounding to such verification improves the overall quality of tutor response generation. We collect a dataset of 1,002 stepwise math reasoning chains with the first error step annotated by teachers. We show empirically that finding the mistake in a student solution is challenging for current models. We propose and evaluate several verifiers for detecting these errors. Using both automatic and human evaluation we show that the student solution verifiers steer the generation model towards highly targeted responses to student error which are more often correct with less hallucinations compared to existing baselines. The benchmark dataset and code will be released openly.

Publication status

published

Book title

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Journal / series

Volume

Pages / Article No.

8386 - 8411

Publisher

Association for Computational Linguistics

Event

29th Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09590 - Kapur, Manu / Kapur, Manu check_circle
09684 - Sachan, Mrinmaya / Sachan, Mrinmaya check_circle
02219 - ETH AI Center / ETH AI Center

Notes

Funding

Related publications and datasets