
Watch recording: https://youtu.be/naUsOaj75I8
Abstract
With the improved abilities of large language models (LLMs) and their growing use in teaching and learning, the issue has naturally arisen of whether they could be used to grade assessments. There are different ethical concerns when thinking of replacing a teacher with a machine for such a crucial step for students, but many only matter if a primary question has been positively answered: are LLMs now ‘smart’ enough to grade?
In this presentation, I will report on several experiments to possibly tackle the previous question. I will focus on reflective diaries and grades I assigned to them, and rely on successive AI approaches, from simpler to more advanced ones. Each time, I will quantify the distance between my grades and AI-based ones, in the hope of designing a solid methodological framework. I will conclude with a broader discussion on the topic of AI-assisted grading and its different facets.
About the Speaker

Christophe Coupé is an Associate Professor in the Department of Linguistics at HKU. His research and teaching revolve around the use of natural language processing (especially generative AI), machine learning and data science to study languages, animal communication and literature. He’s currently the Director of the Bachelor of Arts in Humanities and Digital Technologies.
Seminar Details
Title: Experimenting with AI-based Grading
Date: May 4, 2026 (Monday)
Time: 12:30 pm – 2:00 pm (HKT)
Language: English
Venue: Arts Tech Lab (Room 4.35), 4/F, RRST, HKU
This event is hosted by the HKU Arts Tech Lab.
Seminar Recap
We were honoured to welcome Professor Christophe Coupé to Arts Tech Lab for the final session of the AI and Teaching Series, where he shared his ongoing experiments on AI-based grading and its implications for assessment in higher education.
The seminar began with a central question: before teachers can decide whether it is ethical to involve machines in assessment, they must first ask whether large language models are capable of grading student work in a sufficiently reliable way. Rather than treating AI-assisted grading as either a simple shortcut or an unacceptable replacement for teachers, Professor Coupé framed it as an empirical problem requiring careful testing.
Using reflective diaries from his own teaching, he compared his grades with those generated by successive AI-based approaches. These ranged from a basic prompt to more elaborate strategies that incorporated the assessment rubric, lecture materials, and detailed descriptions of his own grading procedures. The results suggested that the more explicitly the model was instructed, the more closely its grading aligned with his own, although important discrepancies remained.
Throughout the seminar, Professor Coupé stressed that AI should not be understood as an autonomous grader detached from human judgement. Instead, he presented it as a tool that may assist with certain repetitive or clearly defined aspects of grading, provided that teachers remain transparent with students, retain responsibility for the final assessment, and critically review the model’s outputs.
The discussion ultimately shifted from whether AI can save time to what grading itself involves. By making his criteria and cognitive processes explicit, Professor Coupé showed that experimenting with AI-assisted grading can also prompt teachers to reflect more deeply on their own assessment practices.
