Notice
This record has been edited as far as possible, missing data will be added when the version of record is issued.
Assessing GPT Performance in a Proof-Based University-Level Course Under Blind Grading
dc.contributor.author
Ding, Ming
dc.contributor.author
Soldà, Federico
dc.contributor.author
Yuan, Weixuan
dc.contributor.author
Kyng, Rasmus
dc.date.accessioned
2025-07-17T11:30:19Z
dc.date.available
2025-07-06T06:21:48Z
dc.date.available
2025-07-17T11:30:19Z
dc.date.issued
2025-06
dc.identifier.issn
0252-9742
dc.identifier.uri
http://hdl.handle.net/20.500.11850/743845
dc.description.abstract
As large language models (LLMs) advance, their role in higher education, particularly in free-response problem-solving, requires careful examination. This study assesses the performance of GPT-4o and o1-preview under realistic educational conditions in an undergraduate algorithms course. Anonymous GPT-generated solutions to take-home exams were graded by teaching assistants unaware of their origin. Our analysis examines both coarse-grained performance (scores) and fine-grained reasoning quality (error patterns). Results show that GPT-4o consistently struggles, failing to reach the passing threshold, while o1-preview performs significantly better, surpassing the passing score and even exceeding the student median in certain exercises. However, both models exhibit issues with unjustified claims and misleading arguments. These findings highlight the need for robust assessment strategies and AI-aware grading policies in education.
en_US
dc.language.iso
en
en_US
dc.publisher
European Association for Theoretical Computer Science
en_US
dc.title
Assessing GPT Performance in a Proof-Based University-Level Course Under Blind Grading
en_US
dc.type
Journal Article
ethz.journal.title
Bulletin of EATCS
ethz.journal.volume
146
en_US
ethz.pages.start
57
en_US
ethz.pages.end
80
en_US
ethz.identifier.wos
ethz.publication.status
published
en_US
ethz.identifier.url
http://bulletin.eatcs.org/index.php/beatcs/article/view/847
ethz.date.deposited
2025-07-06T06:22:07Z
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.exportRequired
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Assessing%20GPT%20Performance%20in%20a%20Proof-Based%20University-Level%20Course%20Under%20Blind%20Grading&rft.jtitle=Bulletin%20of%20EATCS&rft.date=2025-06&rft.volume=146&rft.spage=57&rft.epage=80&rft.issn=0252-9742&rft.au=Ding,%20Ming&Sold%C3%A0,%20Federico&Yuan,%20Weixuan&Kyng,%20Rasmus&rft.genre=article&
Files in this item
Files | Size | Format | Open in viewer |
---|---|---|---|
There are no files associated with this item. |
Publication type
-
Journal Article [136391]