I've seen assignments that were clearly graded by ChatGPT. The signs are obvious: suggestions that are unrelated to the topic or corrections for points the student actually included. But of course, you can't 100% prove it. It's creating a strange feedback loop: students use an LLM to write the essay, and teachers use an LLM to grade it. It ends up being just one LLM talking to another, with no human intelligence in the middle.
However, we can't just blame the teachers. This requires a systemic rethink, not just personal responsibility. Evaluating students based on this new technology requires time, probably much more time than teachers currently have. If we want teachers to move away from shortcuts and adapt to a new paradigm of grading, that effort needs to be compensated. Otherwise, teachers will inevitably use the same tools as the students to cope with the workload.
Education seemed slow to adapt to the internet and mobile phones, usually treating them as threats rather than tools. Given the current incentive structure and the lack of understanding of how LLMs work, I'm not optimistic this will be solved anytime soon.
I guess the advantage will be for those that know how to use LLMs to learn on their own instead of just as a shortcut. And teachers who can deliver real value beyond what an LLM can provide will (or should) be highly valued.
Is using AI to support grading such a bad idea? I think that there are probably ways to use it effectively to make grading more efficient and more fair. I'm sure some people are using good AI-supported grading workflows today, and their students are benefiting. But of course there are plenty of ways to get it wrong, and the fact that we're all pretending that it isn't happening is not facilitating the sharing of best practices.
Of course, contemplating the role of AI grading also requires facing the reality of human grading, which is often not pretty. Particularly the relationship between delay and utility in providing students with grading feedback. Rapid feedback enables learning and change, while once feedback is delayed too long, its utility falls to near zero. I suspect this curve actually goes to zero much more quickly than most people think. If AI can help educators get feedback returned to students more quickly, that may be a significant win, even if the feedback isn't quite as good. And reducing grading burden also opens up opportunities for students to directly respond to the critical feedback through resubmission, which is rare today on anything that is human-graded.
And of course, a lot of times university students get the worst of both worlds: feedback that is both unhelpful and delayed. I've been enrolling in English courses at my institution—which are free to me as a faculty member. I turned in a 4-page paper for the one I'm enrolled in now in mid-October. I received a few sentences of written feedback over a month later, and only two days before our next writing assignment was due. I feel lucky to have already learned how to write, somehow. And I hope that my fellow students in the course who are actual undergraduates are getting more useful feedback from the instructor. But in this case, AI would have provided better feedback, and much more quickly.
So if the professors can cheat and they're happy about having to do less teaching work, thereby giving the students a lower-quality educational experience, why shouldn't the students just get an LLM to write code that passes the auto-grader's checks? Then everyone's happy - the administration is getting the tuition, the professors don't have to grade or give feedback individually, and the students can finish their assignments in half an hour instead of having to stay up all night. Win win win!
The value of educational feedback drops rapidly as time passes. If a student receives immediate feedback and the opportunity to try again, they are much more likely to continue attempting to solve the problem. Autograders can support both; humans, neither. It typically takes hours or days to manually grade code just once. By that point students are unlikely to pay much attention to the feedback, and the considerable expense of human grading makes it unlikely that they are able to try again. That's just evaluation.
And the idea that instructors of computer science courses are in a position to provide "expert feedback" is very questionable. Most CS faculty don't create or maintain software. Grading is usually done by either research-focused Ph.D. students or undergraduates with barely more experience than the students they are evaluating.
As an higher education (university) IT admin who is responsible for the CS program's computer labs and is also enrolled in this CS program, I would love to hear more about this setup, please & thank you. As recently as last semester, CS professors have been doing pen'n paper exams and group projects. This setup sounds great!
It's a complete game changer for assessment—anything, really, but basic programming skills in particular. At this point I wouldn't teach without it.
In this case, it is an external service. However, I also suspect that the Duo outage is probably shielding other on-campus services from load surges that would probably be causing them to get crashy.
I guess I don't know how we could ever prevent such incidents. Given that the first day of classes is a well-kept secret /s.
I’m curious how this case would play out if some males applying to CalTech did this against female applicants. That said I’m not sure how much gender based affirmitive action there is in science/engineering today.
Potentially quite a bit. Here's some recent data about admissions into the highly-competitive Illinois CS program: https://www.reddit.com/r/UIUC/comments/12kwc4a/uiuc_cs_admis...
Note that admissions rates for female applicants are higher across all categories—international, out-of-state, and in-state. Obviously you can't fully tell what's going on here without more of an understanding of the strengths of the different pools, but a 10–30% spread (for in-state) suggests that gender is being directly considered.
IANAL, but I'm also concerned about the degree to which this decision affects the use of other factors during college admissions. Fundamentally admissions is a complex balance between prior performance and future potential, and only admitting based on prior performance means that we're stuck perpetuating existing societal inequities.
It's a great model for churning out highly educated workers. We need that, and there is a place for higher educational institutions that do that well. But for all of its graduates, ASU doesn't produce many thinkers, founders, philosophers—people who are going to move the needle of our society. To see this, compare the notable alumni lists of, say, ASU and Stanford (both founded in the same year). Look at Turing Award recipients, Nobel Laureates, etc. It's not a new American university - ASU is the same as it's always been.
When I look at the largest universities in the US by enrollment, I think the closest university to a true "New American University" is UIUC (no affiliation) in Illinois. Enrollment is in the top 10, similar in size to ASU. They have multiple programs ranked in the top 10 including computer science. While past success doesn't predict the future, there are some heavy hitters on the UIUC alumni list - Marc Andreessen, Steve Chen, Max Levchin. Would love if anyone happend to attend both ASU and UIUC and could compare the two.
At least according to my quick reading of the article, ASU has a significant focus on inclusion as a core value. Overall Illinois does admit a large percentage of applicants: about 50% over recent years. (The number dropped a bit after we began participating in the Common App, which makes it easier for students to increase the number of institutions they apply to.)
However, that number hides the fact that admission to top programs like computer science is extremely selective and exclusive. Admission rates to CS have been around 7% recently. And while we've made a CS minor somewhat more accessible, we've also closed down pathways that allowed students to start at Illinois and transfer into a computer science degree. (At this point that's pretty much impossible.) We do have blended CS+X degree programs that combine core studies in computer science with other areas, and those are less selective, but they have their own limitations—specifically, having to complete a lot of coursework in some other area that may not interest you.
I think what's fooling you about Illinois is the fairly odd combination of a highly-selective department (CS) embedded in a less-selective institution. I'm sure that there are other similar pairings, but overall this is somewhat unusual. If you think about other top-tier CS departments—Stanford, Berkeley, MIT, CMU—most are a part of an equally-selective institution.
So with Illinois you're getting the cache of an exclusive department combined with the high acceptance rate of an inclusive public land-grant university. But on some level this is a mirage created by colocated entities reflecting different value systems. And, unlike places like Berkeley and Virginia, which have been trying to admit more students into computing programs, no similar efforts are underway here at Illinois. (To my dismay.)
Overall, unfortunately it's still very obvious to me that exclusivity is part of what we're selling to students as a core value of our degree program. You're special if you got in—just because a lot of other people didn't. Kudos to anyone moving away from this kind of misguided thinking.
When / where are students struggling? Assess them frequently and you'll find out! We run weekly quizzes in my class. So we know exactly who's struggling with exactly what, and quickly. That allows us to do individual outreach, and for students to catch up before they get too far behind. We also use daily homework, for the same reasons. But a lot of CS1 courses are still using the outdated midterm and final model, maybe with some homework sprinkled in.
Frequently a glut of repetitive student questions points to bad course materials or poor course design. Make things more clear and make it easier for students to find information and at least some of the repetitive question asking will diminish.
Grading and TA support are related. Graduate TA quality does vary greatly, and you need to design around this. For example: Never put students in a position to suffer for an entire semester at the hands of a bad TA. (Many courses do.) Undergraduates are almost always better at assisting with early CS courses, and usually cheaper. We've been shifting gradually toward more undergraduate support for our CS1 course, and it has been working out well. They frequently outperform graduate staff.
But no amount course staff will be sufficient if you have them spend all of their time on tedious tasks that computers can do better: Like grading code! It's 2023. If you can't deploy your own autograder, buy one. Staff time grading code should be minimized or eliminated altogether. Freeing staff time for student support allows you to provide students with more practice, and accelerates the overall learning process. But many early CS courses are stuck in a situation where staff grading is bottlenecking how many problems they can assign. That's insane, when autograding is a well-established option. (Even if you want to devote some staff time to grading code quality, autograding should always be used to establish correctness. And you can automate many aspects of code quality as well.)
In my experience, what's at the root of a lot of these problems is simply that many people teaching introductory CS can't build things. Maybe they can implement Quicksort (again), but they can't create and deploy more complex user-facing systems. I mean, you can create an autograder using a shell script! Not a great one, but still far superior to manual human grading. Part of this is because these jobs pay poorly. Part is how we hire people for them, because the ability to build things isn't typical a criteria. Part of it is that there's little support for this in academia. It took me years of inane meetings to get a small cluster of machines to run courseware on for my 1000+ student class that generates millions of dollars in revenue.
But there's also a degree to which the CS educational community has started to stigmatize expert knowledge. If you do enjoy creating software and are good at it, you get a lot of side eye from certain people. "You know that students don't learn well from experts, right?" And so on. Yes, there is a degree to which knowing how to do something is not the same as being able to teach someone how to do it. But would you take music lessons from someone who was not only a mediocre player, but didn't seem to like music that much at all?
Over time this has become more sophisticated. I've created custom commands to incorporate training tips from YouTube videos (via YT-DLP and WhisperX) and PDFs of exercise plans or books that I've purchased. I've used or created MCP servers to give it access to data from my smart watch and smart scale. It has a few database-like YAML files for scoring things like exercise weight ranges and historical fitness metrics. At some point we'll probably start publishing the workouts online somewhere where I can view and complete them electronically, although I'm not feeling a big rush on that. I can work on this at my own pace and it's never been anything but fun.
I think there's a whole category of personal apps that are essentially AI + a folder with files in it. They are designed and maintained by you, can be exactly what you want (or at least can prompt), and don't need to be published or shared with anyone else. But to create them you needed to be comfortable at the command line. I actually had a chat with Claude about this, asking if there was a similar workflow for non-CLI types. Claude Cowork seems like it. I'll be curious to see what kinds of things non-technical users get up to with it, at least once it's more widely available.