I got instantly fascinated by ChatGPT’s capabilities once I tried it out, and ever since, I’ve been finding new and new ways to use its skills. Now I would like to share one of the less common use cases I tried: AI grading, because the results really surprised me and also highlighted the precautions needed when using and trusting GenAI tools.
When I applied to a master’s programme a couple of months ago, I realized I needed to take an English language exam to get accepted. I chose IELTS certification, and as someone who has lived and worked in English in a foreign country for a couple of years now, I wasn’t too worried about passing the exam, until I opened some IELTS prep forums and saw that pretty much everyone says the writing part is so difficult that even native speakers struggle to get a good score. That was the moment I realized I needed to practise my English essay writing skills a little more than I had anticipated. But since I didn’t want to spend hundreds of euros on IELTS prep courses or teachers to check my essays, I decided to try ChatGPT as my scoring and mentoring buddy.
I gave ChatGPT the detailed scoring criteria for each part of the writing section and asked it to provide me with feedback, scores for each subsection, and an overall score for the entire writing part. I started my first essay with a relaxed attitude, thinking I was in good hands, since all ChatGPT had to do was check my English essay and compare it to the detailed requirements.
To my surprise, it gave me a very low score every single time (I wrote about 10–15 essays), even lower than my required passing score. Its feedback always felt a bit shallow, and no matter how hard I tried to implement its recommendations, my scores just kept going down. So when exam day arrived, I felt completely demotivated and thought there was no way I’d pass the writing section.
I took the exam anyway. To my even bigger surprise, my result turned out to be one score band higher than I needed and two bands higher than ChatGPT’s average. Later, I looked it up, and YouTube is full of similar examples where ChatGPT or other GenAI models give lower scores than human examiners. One example that really stood out to me was a video by an official IELTS expert named Asiya. She tested AI scoring on three essays: her own, her fellow tutor Kevin’s, and her student Maria’s. The results turned out to be quite interesting: Asiya and Kevin both got slightly higher scores from the human examiner compared to the AI tools, but Maria’s essay was rated significantly lower by the AI than by the human.

So why do I think this was an important lesson? The obvious answer is what everyone keeps repeating: we need to use GenAI carefully and apply a critical mindset when considering what it says, especially if we’re not experts in the area. Secondly, it was interesting to notice how AI scoring, and simply knowing I was scored by AI, affected me. It made me feel genuinely demotivated, like I was fighting against a system whose opinion simply couldn’t be changed.
Still, I believe GenAI can be a very useful tool for (self)education, but probably in a hybrid setup, where it’s used for scoring together with a human, and the human ultimately has the final say on the result.
Also I would be very curious about other students’ experiences: Have you ever been scored/graded by AI? If yes, how did it feel? If not, would you be open to it?