Flawed Algorithms Are Grading Millions of Students’ Essays

> Utah has been using AI as the primary scorer on its standardized tests for several years. “It was a major cost to our state to hand score, in addition to very time consuming,” said Cydnee Carter, the state’s assessment development coordinator. The automated process also allowed the state to give immediate feedback to students and teachers, she said.

Yes, education takes time and costs money. Yes, not educating is both cheaper and faster. Note how the rationalizing ignores the needs of the students and the quality of the education.

I live in Utah and my children have been subjected to this automated essay scoring here. One night I came home from work and my son and wife were both in tears, frustrated with each other and frustrated with the essay scoring which refused to give a high enough score to meet what the teacher said was required, no matter how good the essay was. My wife wrote versions herself from scratch and couldn’t get the required score. When I got involved, I did the same with the same results.

Turns out the instructions said the essay would be scored on verbal efficiency; getting the point across clearly with the fewest words. I started playing around and realized that the more words I added, the higher the score, whether they were relevant or grammatical or not. Random unrelated sentences pasted in the middle would increase the score. We found a letter of petition online for banning automated scoring for the purposes of grades or student evaluation of any kind. It was very long, so it got a perfect score. I encouraged my son to submit it, and he did. Later I visited his teacher to explain and to urge her to not use automated scoring. She listened and then told me about how much time it saves and how fast students get feedback. :/

piokoch · 7 years ago

Frankly, I can't believe what I am reading. The idea that some "AI" grades essays automatically is idiotic and has nothing to do with education. Where is the place for discussions? Where is the place for ideas confrontation? Where is the place for writing style development? How this AI is supposed to grade things like repetitions (that can be either good rhetorical tool or a mistake, depending on context), etc?

Who the hell came out with such an idea. I would even hesitate to use "AI" for automatic spell checking as it is sufficient to give some character unusual name and it will be marked as error.

My guess is that soon or later people will learn how to game that AI. I wouldn't be surprised if there were some software that will generate essay that Utah "AI" likes.

dagw · 7 years ago

My guess is that soon or later people will learn how to game that AI.

Already been done. http://lesperelman.com/writing-assessment-robo-grading/babel...

Here's a sample essay that is complete nonsense and got a perfect score on the GRE.

http://lesperelman.com/wp-content/uploads/2015/12/6-6_ScoreI...

maze-le · 7 years ago

>> Who the hell came out with such an idea.

I'd guess this is a product of dwindling state finances and contempt for any form of real education. AI's are orders of magnitude cheaper than real teachers. They also don't form unions and wouldn't voice any opposition against changes in the curiculum.

They are also pretty useless, as you have pointed out. The consequences of this policy will be postponed until the students reach a certain age -- that'll be like 10-15 years in the future.

Balgair · 7 years ago

> My guess is that soon or later people will learn how to game that AI.

To be fair, the GP here is specifically describing that he gamed the AI via a copy-paste of a critique of the AI, his kid submitted it on their own accord, it was graded without comment, and then when the GP went into comment on the gaming of the AI, the teacher not only did not care that the AI was gamed, but expressed gratitude for the AI saving hours of work, still ignoring that the AI fundamentally made things worse, all at the expense of the entire point of being a teacher in the first place.

The issue, for the teacher, is that in 'the system' in which they collect a pay-check, the AI works flawlessly. The point, for the teacher, is not to educate children. It is to have assignments that children pass with some sort of distribution that can be sent in and calculated by some person in a beige suit, wide tie, and hair troubles. The difference is subtle at first, but when you get further along to the point where the GP is sitting, then the difference is comical.

The AI allows the teacher to increase their effiency in processing assignments, ones that never really mattered to the teacher in the first place. In valley-speak: the incentives are not aligned.

de_watcher · 7 years ago

I can't believe either, it's completely ridiculous. They're basically claiming that they've developed a general AI. It's like some part of population is living in different fantasy worlds and makes policy decisions accordingly.

amiga_500 · 7 years ago

Is the USA tenable going forward? Your cost of everything, value of nothing culture appears to be very destructive.

lr4444lr · 7 years ago

I agree with you wholeheartedly but I think there's a stronger argument to be made here: the algorithms being used "work" only on a correlation based on an ignorance of the scoring metric. If the students under test knew even sketchily how the system worked, e.g., points deducted if your average sentence word length > 7, points added if your word length stddev is greater than 2, and the students could meaningfully push their scores up by focusing on these proxies that don't _actually_ measure what a human would say is quality work - or even they can even get gibberish[0] rated highly - then the whole thing is a fraud. No one will stand for a grading system that only works by virtue of obscurity.

[0] https://www.nytimes.com/2012/04/23/education/robo-readers-us...

Nasrudith · 7 years ago

It is classic beancounter thinking in the worst way, the worst stereotype of a MBA trying to minimize cost beyond all reason cutting corners. Even when it saws at the branch they sit upon.

It is frankly a sign of a diseased culture to use it in any capacity except an exercise to improve AI.

wisty · 7 years ago

Teachers often score by similarly sensible criteria.

ALittleLight · 7 years ago

When I was a child I was obsessed with the "grade level" function in Microsoft Word. It was a preference you could enable on spell check to tell you the "grade level" of your writing.

Every essay I wrote, I'd always force myself to reach the max "12.0" grade level. While writing I'd struggle over word choice, sentence structure, rearranging paragraphs, working on my tone etc, all in pursuit of the 12th grade way to phrase things. All my revisions were subject to the approval of the Grade Level checker.

Whenever I could I would check the grade levels of my friend's writing - usually by showing them a "neat feature" they could enable. Then, I'd smugly applaud myself for being the better writer whenever their grade level was below 12.0.

The Grade Level feature fascinated me and to try and master it, I found a book about Microsoft Word and looked through it in a bookstore. I was absolutely gobsmacked at how simple the formulas was. I had childishly been expecting something, like perhaps Utah educators imagine they have. I genuinely expected the method to be complex beyond my understanding.

Instead, Word used a variant of Flesch-Kincaid. There was a direct relationship between sentence length and grade score, and polysyllabic words and grade score. Meaning, the longer your sentences and words, the higher your grade score.

As soon as I got home from the bookstore I loaded a draft of something I had written. It was "pre-12.0" writing from me. I simply deleted all the periods but one and checked again. 12.0.

Automatic grading is a wonderful lure. It's nice to imagine that there's some objective writing quality easy to tap into. At the moment, I think we're far from that ability.

Personally, I feel the solution to insufficient teacher time is to use peer grading much more, and spot checks. Get kids to read and revise each other's works frequently, and teachers should aim to grade at least N papers per student where N is much less than the number of papers a student writes.

Revising is a really vital part of writing. Getting more chances to do revision, plus having to write something good enough to show your peers, plus having the risk of any paper count for your grade should compensate for incomplete teacher grading.

nerdponx · 7 years ago

The fact that you were literally still a child when this happened, but automated grading is being foisted on us by grown adults who are ostensibly professionals, says a lot about the situation.

rwbcxrz · 7 years ago

> Personally, I feel the solution to insufficient teacher time is to use peer grading much more, and spot checks. Get kids to read and revise each other's works frequently, and teachers should aim to grade at least N papers per student where N is much less than the number of papers a student writes.

That's how it's done in creative writing courses. I've always found it infinitely more helpful than only having feedback from the instructor, even if the instructor's feedback was generally more helpful/useful than peer feedback.

pzs · 7 years ago

Arguably, Hemingway's texts are well written. One of the sources of power of his prose is the use of simple words, and basic sentence structures. I bet Word would classify that as below 12th grade.

The point I am trying to make in agreement with the parent is: there are qualities that are very hard to score with algorithms. The difficulty of solving this problem equals if not exceeds that of automated translation, which still only works properly for specialized and limited domains, e.g. weather forecasts.

regrub · 7 years ago

All that grade-level gaming paid off, I reckon! This was a funny, informative personal account of it :)

jhanschoo · 7 years ago

It's interesting that the tool (and system) is designed to aid people trying for the opposite result, i.e. for publicists and other authors striving to word their message to be as widely understood as possible.

mnky9800n · 7 years ago

You just ruined my childhood Thanks.

bryanrasmussen · 7 years ago

I went to high school in Utah, long before this automated scoring. It sounds awful but considering the quality of the education I received there perhaps not that bad after all.

My best Utah education anecdote - In the first day of British literature class the teacher came in and asked "Does anyone here know what A.D means?" someone said After Death - she said no. I figured this was my time to shine so I raised my eager hand and said "Anno Domini, in the year of the lord" - she said no.

Then she announced: "A.D means after the Deluge, and B.C means before Christ".

She also totally lied to me one time about whether she would be considering a particular textbook question as applying to Rosencrantz or Guildenstern.

Anyway I think that was one of the many classes I got an F in after stopped going and would walk past it every day on my way to play chess with my German teacher.

C1sc0cat · 7 years ago

How is this relevant in a Lit class - presumably you hid the fact you where a Catholic / Anglican from her.

clay_the_ripper · 7 years ago

Wow this is pretty shocking. I can understand using automated systems for something like math problems, it makes sense. There’s (usually) one right answer. But essays? This should be banned.

Baeocystin · 7 years ago

Wait 'til you see a kid in tears because the math answer they submitted was supposed to equal zero, but the algorithms behind the scenes are so bad that the float math failed the equality check.

Note: This is not hyperbole, I have seen this exact scenario more than once.

There may be a place for a well-designed one, but if it exists, I've never seen it.

Aromasin · 7 years ago

Having been forced to use an online math software for all my homework while at school, I vehemently disagree. It was so poor that it became a meme within my year group.

It would mark you as incorrect for using too many decimal places, even though it wouldn't tell you how many significant figures was required. I often remember it marking my answer as incorrect, even though it was identical to the answer they gave. Sometimes you'd have to show your working, but it couldn't handle brackets. Once I put the answer as "1+x=y" but they wanted the answer "y-1=x", and they marked it as incorrect.

I'm sure academic software design is leaps and bounds above what it was in the early 2000's, but to have a pupils futures hinge on what generally seems to be poorly tested code is dangerous.

wickedsickeune · 7 years ago

I have often solved many hard math problems with very unconventional solutions (eg geometrical proof for algebraic problems). Trust me, a piece of software is decades away from being able to accurately determine the future of children and massively impact their self esteem / trust in society.

UweSchmidt · 7 years ago

It is important for the teacher to see where part of the class took the wrong turn, where the students' understanding ended. It is important to distinguish between careless errors, wrong memorizing of a formula and lack of understanding.

yoz-y · 7 years ago

> Turns out the instructions said the essay would be scored on verbal efficiency; getting the point across clearly with the fewest words. I started playing around and realized that the more words I added, the higher the score, whether they were relevant or grammatical or not.

Frankly, this does not change anything from my experience in school decades ago. The teachers always said that the length does not matter and we should not pad the papers. However students who wrote more pages got better scores every single time.

bradstewart · 7 years ago

Is it possible that the students who wrote shorter papers were in fact presenting incomplete arguments and/or thoughts? Writing clearly and concisely is extremely difficult.

MertsA · 7 years ago

Historically on the written portion of the SAT length is substantially correlated to the final score.

raxxorrax · 7 years ago

You have automated systems that rate essays without any human actually reading them?

Kids, forget everything you know because crime does indeed pay off. Best grades will be reserved for those that try to cheat this system however it is implemented. Botting your essays is the way to go in the 21st century.

dagw · 7 years ago

Given that the stack ranking at your future job will also be done by an "AI" (probably developed by the same company that graded you tests) this is a very useful skill to have.

_0ffh · 7 years ago

>Turns out the instructions said the essay would be scored on verbal efficiency; getting the point across clearly with the fewest words. I started playing around and realized that the more words I added, the higher the score

Apart from the fact that your story is straight up frightening, isn't this part completely backwards, too? I mean, clearly using more words to convey the same message is /less/ efficient, not more so?

dahart · 7 years ago

Yes. Exactly. Before I figured out how to game the program, my son and wife were editing shorter. That’s what the instructions said to do. And, that’s also a major strategy for decent writing: brainstorm a lot, then edit down to the good parts. What this means is the software’s scoring is an anti-incentive to good writing. Used as a teaching aid, it’s actually doing pure damage, not good. Not only can it not score reliably, nor provide meaningful feedback, it’s actually actively teaching a very wrong way to write. But it is cheaper than humans, and it does give immediate feedback, so there’s that.

bigred100 · 7 years ago

This is a problem I have with a lot of human behavior. Instead of admitting you don’t have the resources to do something or aren’t willing to prioritize it, people come up with a bad version that’s not worth doing. Lots of things are worth doing poorly, but many of them I believe you just need to admit are not worth it unless a certain level of performance is met.

What’s even cheaper than AI? Tell the students to write some pages, have the teacher glance at the number of pages written, give full credit if the mark was met, and throw the papers out without reading them. It sounds like it would be similarly effective and less aggravating. Unfortunately, this would require humility on the part of the educators.

analyst74 · 7 years ago

Think from a positive angle, students today are learning useful life skills to game computer systems, which they will have to deal with when they grow up.

edit: ...just like how previous generations have to learn how to game social systems.

alexanderdmitri · 7 years ago

Except the algorithm being gamed can change suddenly, drastically and without the gamer's knowledge.

When such changes occur, the gamer will be docked until they can reverse-engineer the new algorithm. There's also the risk that all their previous inputs "gaming" the system might be reconsumed to terrible results as well, effectively rewriting their historical performance disasterously.

As always, those with the social standing and power to have insider knowledge or guidance will be in the best position to profit off such systems.

YeGoblynQueenne · 7 years ago

Ho ho. Wait. You mean you were able to submit multiple versions of the essay? So that anyone can basically game the test, by submitting multiple essays until they get the best score they can wring out of it?

That is just mad.

c3534l · 7 years ago

It's be easier and equally fair to just grade student's essays by rolling a pair of dice.

ejk314 · 7 years ago

It's arguably more fair. At least purely random scoring doesn't incentivize cheating.

nraynaud · 7 years ago

wait, you get the score in real time? Like some kind of objective function you can train a machine to maximise on?

christophilus · 7 years ago

Heh heh. I like the way you think. Hackers of Utah unite!

nyxtom · 7 years ago

This needs to just be outright banned

imtringued · 7 years ago

I can't even comprehend how someone can use automation for a task like this... It completely goes against human nature. In a world where all jobs have been automated teachers would be the last ones to go before humanity is completely obsolete.

daveFNbuck · 7 years ago

Do you just get to keep submitting the essay to see what score it will get before you turn it in? That sounds like a bigger problem than any of the particulars about how the grading is done.

dahart · 7 years ago

In this case, there was a limit to the number of times the essay could be submitted, and there was a required score that needed to be obtained within that limit, otherwise the grade would go down. The limit was something like 20 tries, and when I got there they’d already used maybe 14 of them.

I could perhaps see value in having unlimited tries, as a teaching aid, if the result wasn’t being used for grading. That would at least leave room for curiosity and exploration. And, more importantly, I could see value if the software wasn’t essentially a scam that fundamentally is not able do what is advertised. If the software really could grade essays reliably, and provide meaningful suggestions for improvement, then maybe it could be used to help educate students, in conjunction with the teacher’s guidance. But the software does not grade reliably, and it absolutely does not offer meaningful constructive feedback, and the teachers were using it to avoid reading essays, not to supplement their own expertise.

One of the several amusing ironies here is how the software company has convinced the state and teachers to willingly replace themselves with bots, despite obvious evidence that the humans can do the job better.

pault · 7 years ago

The instant feedback mechanism is just begging for someone to turn it into a GAN by writing the other half. I would absolutely love to hear that some particularly clever high school student was able to train an ML algorithm to consistently fool the grading algorithm, thus instantly rendering all of their efforts worthless and dragging the administrators through the mud at the same time.

seanmcdirmid · 7 years ago

That really sounds like Utah, they have lots of students (due to LDS influences) with a conservative government (ditto), so the pupil/teacher ratio is insane. I can guess the teacher really doesn’t have any other choice.

This is my first time learning that AI-graded essays are a thing. Am I the only one who thinks that's insane? I feel like you'd probably have to have an AGI to meaningfully evaluate an essay.

_delirium · 7 years ago

I work in AI, and was very surprised when I heard about this (a few years ago). I don't think anyone who works in the area thinks the tech is ready for this kind of deployment. There is research on the subject [1], and NLP systems can do better than baseline methods, but the error rates are still pretty high.

A thing you quickly find if you try to download off-the-shelf NLP tools and apply them to anything is how little is reliable at all, unless you can constrain the domain. Even basic topic identification only works with low error rates when constrained to something like NYT stories, or PubMed abstracts, not arbitrary text by arbitrary writers. And I would bet ETS is using worse tech than research state-of-the-art.

[1] e.g. https://www.aclweb.org/anthology/P15-1053

harry8 · 7 years ago

You've noticed though that the AI con is on. This damages your work as people get burned and will bring about the second "AI winter"

People making big decisions with a lot of money around computing know nothing about it and are marks for con-artists. Think big consulting firms selling to senior public servants in washington. "For a successful technology reality must take precedence of public relations." But reality just gets in the way when conning a mark for a successful snake oil sale, right?

Call it out, publically, cite your credentials. Encourage colleagues, your competition and everyone with a clue to pour scorn on whoever is selling this evil, toxic waste as drinkable.

mlthoughts2018 · 7 years ago

Hmmm. I also work in AI, in fact professionally in information retrieval and NLP. I disagree strongly with what you say. Basic topic summarization and keyword / named entity extraction on unstructured sources of text works reasonably well. It’s easy to modify BERT and GPT on smaller problems, language classification is borderline totally solved by extremely easy to train neural network models.

I still agree that automatic essay grading is beyond the reach of SOTA NLP models today, but youmake it sound like virtually nothing can be done in a production-grade manner that solves real world unconstrained NLP problems. This is manifestly false.

chrisdsaldivar · 7 years ago

We had this in my school for 8th and 9th grade so 2008-2010. We had to type the essays in class and submit by the end of the hour. I would only get maybe 3 paragraphs in before time was up because I was trying to build a strong argument for the prompts. Despite that I would usually get 3-4/6 and my teacher said she would read the essays and regrade but she never actually did. My friend literally copy and pasted the pledge of allegiance 20-30 times and scored a perfect 6/6. Later we found out if you repeated the words in the writing prompt you would get a guaranteed 5/6 and with a high enough word count you’d get 6/6. The essays were all bullshit and just a way for the teachers to get an extra free period once a week.

xkcd-sucks · 7 years ago

I totally agree that "AI" grading is totally bullshit. But, I also have plenty of experience teaching/TAing large courses, and after reading too many essays they all become semanticically saturated meaninglessness. One can not help but skim them, and grade according to a few quick heuristics. At that point one tries to be self-consistent and defensible in one's grading, but careful consideration is right out. I suspect state graders are dealing with way more than 100 essays per person and are probably on a tight schedule too. It's quite possible that a ML model is better than an exhausted human grader, as their cognitive strategies are mostly identical.

kwhitefoot · 7 years ago

The solution isn't to do a better job at grading 'meaninglessness' but to stop requiring the production of it in the first place.

One major problem with algorithmic approaches, whether automated or not, is that they become the definition of good in the context and therefore become something that cannot be argued against. And of course it makes 'teaching to the test' an even more likely outcome.

If I were a conspiracy theorist I'd attribute this to wanting a dumbed down population. Unfortunately I think it is probably the other way round, the population is already dumbed down and a belief in AI unicorns is the result.

As Aristotle said to Alexander: 'There is no royal road to geometry', and so it is with education; it's hard work for both the student and the educator and no amount of AI/ML/algorithmic snake oil will change that without also changing the meaning of the word education.

nefitty · 7 years ago

I remember when I was in middle school 16 years ago, my English classes would have us submit some of our work to a web app. It would then grade the submission. I remember this distinctly because I asked my teacher to intervene on at least two occasions. The app failed to recognize the words "squirrelly" (as in "That guy in the corner has been acting squirrelly.") and "defragment". My teacher decided to subvert the app's recommended grades because she, as a human, understood the intent of my use of those weird words.

To emphasize, this was 16 years ago.

dcolkitt · 7 years ago

> I feel like you'd probably have to have an AGI to meaningfully evaluate an essay.

So the reason this isn't the case, is because there are very simple metrics that tend to highly correlate with essay quality. It doesn't mean the grading-bot is actually evaluating essay quality. It's just looking for properties that are statistically associated with good essays. Remember, at the end of the day as long as the bot's ranking is close enough to the human grader's ranking, nobody really cares about the internal logic.

A very straightforward example is spelling mistakes. People who make spelling mistakes aren't necessarily bad writers. And vice versa, there may be great speller who can't write for shit. But by and large the people who spell poorly also tend to write poorly. Easily detectable grammatical issues, like misplaced modifiers, subject verb disagreement, or inconsistent tense, are also correlated indicators.

A very simple metric is essay length. Especially if its a timed exam. Good writers tend to have verbal fluidity, with words easily flowing to paper. They don't struggle converting thoughts too sentences. So they tend to end up with the most words written down within a fixed time period. By and large the longer a timed essay is, the more likely that its actual quality is high.

Grading bots basically rely on these statistical relationships. They're not measuring anything intrinsic to good writing. But at the end of the day, their student rankings are usually pretty close to that of a typical human grader. In some cases the bot will have a closer ranking to a random human grader, than two random human graders will have to each other.

The biggest flaw here is Goodwin's law. When the test takers become aware of the kludges that the bots use, they can exploit it. For example just dump a bunch of verbal diarrhea with as many correctly spelled words as possible. But even then it doesn't really hurt the bot's ranking accuracy too much. Because the kids who do the most test-prep and learn all the tips and tricks, are usually high-achievers who do well on essays anyway.

bo1024 · 7 years ago

Strongly (but respectfully) disagree with a lot of this!

This is related to current fairness-in-AI discussions. In many cases the basic problem is ML systems leverage correlations for making causal decisions. Here, there is a huge ethical difference between scoring a person based on "is this a good essay" and "do the features of this essay correlate with features of good essays". Just like there is a huge fairness and discrimination difference between "is this person qualified for a loan" and "do the features of this person correlate with features of people who qualify for loans" (algorithmic redlining). Your last sentence has a big discrimination/fairness issue also, since you are testing even more for parental income and parental free time.

6gvONxR4sf7o · 7 years ago

I can't disagree strongly enough.

>Remember, at the end of the day as long as the bot's ranking is close enough to the human grader's ranking, nobody really cares about the internal logic.

This isn't true at all. Imagine you got a B or C on an essay that a human would have given an A to because you wrote it concisely and in plain language, or because you used language that's statistically correlated with being black. Does the fact that this is rare console you? "Sorry, but it's usually very close to the human grader's ranking." Close enough isn't good enough when you get the short end of the stick. "Sorry, you aren't going to get to go to the college you wanted because you use language statistically correlated with poor writing." Or just because you're different, so the statistical correlation doesn't apply to you, you filthy outlier. Just because it's a rare event doesn't make it okay.

In adulthood, this is like hiring or firing for work statistically correlated with good work. Remember when amazon rolled out the resume scorer? [0] Sure it was biased towards women, but it was close enough to human scores, so who cares about the internal logic?

>Grading bots basically rely on these statistical relationships. They're not measuring anything intrinsic to good writing.

At the end of the day, our goal here is to measure good writing. If the bots aren't measuring anything intrinsic to good writing, we shouldn't use them.

https://www.reuters.com/article/us-amazon-com-jobs-automatio...

ptx · 7 years ago

I think you meant Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."

mannykannot · 7 years ago

Your last paragraph, and particularly the last sentence, epitomizes what is wrong with your whole thesis: the ultimate goal of the testing (and education itself, for that matter) is not to find people who can "do well on essays"; it is to develop analytical thinking.

rvense · 7 years ago

It is absolutely insane. By no definition does the system understand what is written.

You could ask a student to write an essay taking a firm opinion on some subject, and they could change standpoint every paragraph and there's no way these systems would know.

If I was a student I would be extremely offended at people wasting my time like this.

chucksmash · 7 years ago

I'm surprised people are surprised by it. I guess it just hasn't gotten talked about it a lot? When I took the GRE in 2011 the rule was that my essay would be graded by one human and one automated grader, and a second human would become involved if the computer and the human differed by one point or more iirc.

Maybe nobody really makes a big deal about it because it is pretty much irrelevant anywah. Applicants provide a letter of intent that the grad dept people can, y'know, actually read for themselves, so I think unless you totally bombed the writing section nobody cared.

Spivak · 7 years ago

In a forum of CS people I'm surprised this is one of the top opinions. Our field is full of super surprising results like this -- that you don't have to actually understand the text at beyond basic grammar structures to reasonably accurately predict the score a human would give it.

Like this kind of thing should be cool, not insane. I mean wasn't it cool in your AI class when you learned that DFS could play Mario if you structured the search space right?

cmroanirgo · 7 years ago

I came first in English for my school, many moons ago. Leading up to the finals, I regularly finished ahead of the hard core the English essay people, generally to my amusement. My exam essay responses were generally half the length (sometimes even shorter) than the prodigious writers. Although I've an ok vocabulary, I always made sure I made the right choice of word to hit a specific meaning, rather than choosing words with a high syllable count.

I'd find it highly interesting to see what kind of result I'd get using an automated system.

Why?

Because, I once asked a teacher (also an examiner) why I got good grades above the others, and the answer surprised me: my answers were generally unique /refreshingly different, to the point/ not too long and easy to read.

I suspect with this new system, I'd be an average student. It'd also be interesting to find out, several years down the road, if the automated system could be gamed at all -- I suspect it could, and teachers would help students 'maximise' their scores as a result of that.

mherdeg · 7 years ago

When I hear a result like "software which understands basic grammar structures can predict what grade a human would give an essay" I think my views are roughly:

* 5% - cool, we could make a company that grades essays

* 15% - cool, we could make a company that grades essays and sell our source code to the test-prep industry

* 80% - fascinating, it sounds like the exam designers need to reevaluate what they are trying to measure with essay questions

danenania · 7 years ago

"...that you don't have to actually understand the text at beyond basic grammar structures to reasonably accurately predict the score a human would give it"

That only really shows that the humans they're training on are terrible at grading essays.

munchbunny · 7 years ago

This problem is a first class demonstration of the difference between "can we?" and "should we?"

The fact that it's being implemented in society is insane because anyone who is paying attention to the state of AI today already knows how it will go wrong: without reading the article I already guessed that it systematically discriminated against certain demographics. Which was in fact what the article claimed.

It's interesting that it's possible to predict what the scorer would decide, but the moment you actually implement it is when all of the known problems become relevant, and the intellectual wonder must take a backseat to the human problems.

jammygit · 7 years ago

Teaching human-human communication by removing human inputs and having computers decide about quality... call me a skeptic. I feel bad for the students. Essay grading was bad enough before this

Narrowly for grammar however - is even that a good thing? It probably helps scale grammar help to more students, but if those tools became ubiquitous in grading and editing then unique voices would just disappear and a lot of potentially “great writers” might choose different careers because the machines don’t like them

shkkmo · 7 years ago

Adding further bias against the underprivileged is not "cool". Implenting this while avoiding publicity or providing a means to publically audit the results is doubly not cool.

It is fine to play with "cool" techniques when you are doing consequence free stuff like playing Mario. When you are creating systems that have significant and long term effects of people's lives a different standard applies.

peteradio · 7 years ago

Based on the title alone, how would you feel if you were given bad marks due to flawed black box?

strken · 7 years ago

This is sort of like discovering the Excel spreadsheet at the heart of a system responsible for handling hundreds of millions of dollars of transactions for your bank.

Yeah, it's cool, but what about your savings account?

Dead Comment