Journal: Physical Review Physics Education Research

Loading...

Abbreviation

Phys. Rev. Phys. Educ. Res.

Publisher

American Physical Society

Journal Volumes

ISSN

2469-9896

Description

Search Results

Publications 1 - 10 of 10
  • Kortemeyer, Gerd; Nöhl, Julian; Onishchuk, Daria (2024)
    Physical Review Physics Education Research
    [This paper is part of the Focused Collection in Artificial Intelligence Tools in Physics Teaching and Physics Education Research.] Using a high-stakes thermodynamics exam as the sample (252 students, four multipart problems), we investigate the viability of four workflows for AI-assisted grading of handwritten student solutions. We find that the greatest challenge lies in converting handwritten answers into a machinereadable format. The granularity of grading criteria also influences grading performance: employing a finegrained rubric for entire problems often leads to errors and grading failures, as the model appears to be unable to keep track of scores for more than a handful of rubric items, while grading problems in parts is more reliable but tends to miss nuances. We also found that grading hand-drawn graphics, such as process diagrams, is less reliable than mathematical derivations due to the difficulty in differentiating essential details from extraneous information. Although the system is precise in identifying exams that meet passing criteria, exams with failing grades still require human grading. We conclude with recommendations to overcome some of the encountered challenges.
  • Kortemeyer, Gerd (2023)
    Physical Review Physics Education Research
    Solving problems is crucial for learning physics, and not only final solutions but also their derivations are important. Grading these derivations is labor intensive, as it generally involves human evaluation of handwritten work. AI tools have not been an alternative, since even for short answers, they needed specific training for each problem or set of problems. Extensively pretrained AI systems offer a potentially universal grading solution without this specific training. This feasibility study explores an AI-assisted workflow to grade handwritten physics derivations using MathPix and GPT-4. We were able to successfully scan handwritten solution paths and achieved an R-squared of 0.84 compared to human graders on a synthetic dataset. The proposed workflow appears promising for formative feedback, but for final evaluations, it would best be used to assist human graders.
  • Kortemeyer, Gerd (2023)
    Physical Review Physics Education Research
    Massive pretrained language models have garnered attention and controversy due to their ability to generate humanlike responses: Attention due to their frequent indistinguishability from human-generated phraseology and narratives and controversy due to the fact that their convincingly presented arguments and facts are frequently simply false. Just how humanlike are these responses when it comes to dialogues about physics, in particular about the standard content of introductory physics courses? This case study explores that question by having ChatGPT, the preeminent language model in 2023, work through representative assessment content of an actual calculus-based physics course and grading the responses in the same way human responses would be graded. As it turns out, ChatGPT would narrowly pass this course while exhibiting many of the preconceptions and errors of a beginning learner. A discussion of possible consequences for teaching, testing, and physics education research is provided as a possible starter for more detailed studies and curricular efforts in the future.
  • Kortemeyer, Gerd; Nöhl, Julian (2025)
    Physical Review Physics Education Research
    This study explores the use of artificial intelligence in grading high-stakes physics exams, emphasizing the application of psychometric methods, particularly item response theory, to evaluate the reliability of AI-Assisted grading. We examine how grading rubrics can be iteratively refined and how threshold parameters can determine when AI-generated grades are reliable versus when human intervention is necessary. By adjusting thresholds for correctness measures and uncertainty, AI can grade with high precision, significantly reducing grading workloads while maintaining accuracy. Our findings show that AI can achieve a coefficient of determination of R² ≈ 0.91 when handling half of the grading load, and R² ≈ 0.96 for one-fifth of the load. These results demonstrate AI's potential to assist in grading large-scale assessments, reducing both human effort and associated costs. However, the study underscores the importance of human oversight in cases of uncertainty or complex problem solving, ensuring the integrity of the grading process.
  • Kortemeyer, Gerd; Babayeva, Marina; Polverini, Giulia; et al. (2025)
    Physical Review Physics Education Research
    We investigate the multilingual and multimodal performance of a large language model-based artificial intelligence (AI) system, GPT-4o, using a diverse set of physics concept inventories spanning multiple languages and subject categories. The inventories, sourced from the PhysPort website, cover classical physics topics such as mechanics, electromagnetism, optics, and thermodynamics, as well as relativity, quantum mechanics, astronomy, mathematics, and laboratory skills. Unlike previous text-only studies, we uploaded the inventories as images to reflect what a student would see on paper, thereby assessing the system's multimodal functionality. Our results indicate variation in performance across subjects, with laboratory skills standing out as the weakest. We also observe differences across languages, with English and European languages showing the strongest performance. Notably, the relative difficulty of an inventory item is largely independent of the language of the test. When comparing AI results to existing literature on student performance, we find that the AI system outperforms average postinstruction undergraduate students in all subject categories except laboratory skills. Furthermore, the AI performs worse on items requiring visual interpretation of images than on those that are purely text-based. While our exploratory findings show GPT-4o's potential usefulness in physics education, they highlight the critical need for instructors to foster students' ability to critically evaluate AI outputs, adapt curricula thoughtfully in response to AI advancements, and address equity concerns associated with AI integration.
  • Kortemeyer, Gerd; Bauer, Wolfgang (2024)
    Physical Review Physics Education Research
    As a result of the pandemic, many physics courses moved online. Alongside, the popularity of Internet-based problem-solving sites and forums rose. With the emergence of large language models, another shift occurred. One year into the public availability of these models, how has online help-seeking behavior among introductory physics students changed, and what is the effect of different patterns of online resource usage In a mixed-method approach, we investigate student choices and their impact on assessment components of an online introductory physics course for scientists and engineers. We find that students still mostly rely on traditional Internet resources and that their usage strongly influences the outcome of low-stake unsupervised quizzes. We empirically found distinct clusters of help-seeking and resource-usage patterns among the students; the impact of students' cluster membership on the supervised assessment components of the course, however, is nonsignificant.
  • Kortemeyer, Gerd; Bauer, Wolfgang; Fisher, Wade (2022)
    Physical Review Physics Education Research
    In a partially flipped, hybrid introductory physics course where students had a free choice between attending any lecture session in person or via video conferencing, and where recordings of the lecture sessions were made available for asynchronous viewing, a total of 16 learner attributes and their relationships were investigated. Five of these attributes reflect participation choices, while eleven attributes reflect assessment outcomes on different course components. In line with the "no significant difference phenomenon," correlations between exam scores and participation choices were weaker than correlations with, for example, prior knowledge as evidenced by pretest scores. Overall, in terms of correlations, participation, and assessment attributes clustered together, respectively, with clicker questions being a connecting attribute between the clusters. Performance aside, we found two populations in the course, which, divided along the line of above and below average in-class attendance, exhibited other distinct behavior attributes mostly related to investment of time and effort in the course.
  • Kortemeyer, Gerd; Kortemeyer, Christine; Bauer, Wolfgang (2023)
    Physical Review Physics Education Research
    At large institutions of higher education, students frequently have a choice whether to attend the introductory physics sequence asynchronously online, on-site in a traditional lecture setting, or in a reformed studio setting. In this study, we investigate how these different settings are correlated with measures of self-efficacy, interest in physics, and success in subsequent physics and engineering courses, which have the introductory physics sequence as prerequisites. As previous research indicates, some of these measures may depend on gender. We found that the course setting had no significant correlation with the grade in subsequent courses, but that studio settings gave students the feeling of being better prepared, particularly for subsequent courses that included laboratory or recitation components. We also found that gender was correlated with measures of interest in physics, where female students expressed significantly less interest in the subject, regardless of course setting.
  • Küchemann, Stefan; Malone, Sarah; Edelsbrunner, Peter; et al. (2021)
    Physical Review Physics Education Research
    Representational competence is essential for the acquisition of conceptual understanding in physics. It enables the interpretation of diagrams, graphs, and mathematical equations, and relating these to one another as well as to observations and experimental outcomes. In this study, we present the initial validation of a newly developed cross-contextual assessment of students’ competence in representing vector-field plots and field lines, the most common visualization of the concept of vector fields. The Representational Competence of Fields Inventory (RCFI) consists of ten single choice items and two items that each contain three true or false questions. The tool can be easily implemented within an online assessment. It assesses the understanding of the conventions of interpreting field lines and vector-field plots, as well as the translation between these. The intended use of the tool is both to scale students’ representational competences in respect to representations of vector fields and to reveal related misconceptions (areas of difficulty). The tool was administered at three German-speaking universities in Switzerland and Germany to a total of 515 first- and third-semester students from science, technology, engineering, and mathematics subjects. In these first steps of the validation of the RCFI, we evaluated its psychometric quality via classical test theory in combination with Rasch scaling and examined its construct validity by conducting student interviews. The RCFI exhibits a good internal consistency of ω=0.86, and the results of the Rasch analysis revealed that the items discriminate well among students from lower to medium-high competence levels. The RCFI revealed several misunderstandings and shortcomings, such as the confusion of the conventions for representing field lines and vector-field plots. Moreover, it showed that many students believed that field lines must not exhibit a curvature, that the lengths of field lines matter, and that field lines may have sharp corners. In its current version, the RCFI allows assessing students’ competence to interpret field representations, a necessary prerequisite for learning the widespread concept of vector fields. We report on planned future adaptations of the tool, such as optimizing some of the current distractors.
  • Merki, Eliane; Hofer, Sarah; Vaterlaus, Andreas; et al. (2025)
    Physical Review Physics Education Research
    When describing motion in physics, the selection of a frame of reference is crucial. The graph of a moving object can look quite different based on the frame of reference. In recent years, various tests have been developed to assess the interpretation of kinematic graphs, but none of these tests have specifically addressed differences in reference frames. Moreover, existing tests that explore differences in reference frame typically focus on the equivalence principle through written answers, interviews, or simple calculations and vector addition; however, none of these tests evaluate position-time graphs. To address this gap in the research, we developed and evaluated the Inventory of Galilean Transformation of uniform linear motion in position-time graphs (IGT). The IGT consists of 15 multiple-choice items that systematically use position-time graphs of linear uniform motion to assess the understanding of 3 types of Galilean transformations: the identity transformation, the transformation between two (opposing) stationary displaced frames of reference, and the transformation from a stationary into a uniformly moving frame of reference. Herein, we presented the development and validation of the IGT. A total of 532 upper secondary school students in the advanced track participated in the multistage development process. We evaluated the psychometric properties via classical test theory and item response theory. The degree of item discrimination and reliability were within the desired range. The IGT demonstrated good internal consistency (ω = 0.88), and confirmatory factor analysis supported the intended structure of the test. Rasch analysis revealed that the item difficulties were consistent with the increasing complexity of the three different transformations. The IGT also revealed several alternative student conceptions about frames of references, such as confusion between two scenarios of cars crossing versus overtaking, as well as misunderstandings regarding the changing shape of a graph when a uniformly moving object is transformed into a moving frame of reference. In its current form, the IGT serves as a new instrument for assessing students' ability to interpret position-time graphs under the influence of the Galilean transformation, making it suitable for formative or summative assessment in advanced upper secondary education.
Publications 1 - 10 of 10