Swamped instructors are turning to—and even inventing—automated grading tools that speed feedback while saving labor, especially their own.
By Mary Lord
Even as a Duke University undergraduate, Dev Dabke enjoyed teaching. Grading? Not so much. He recalls spending long hours manually correcting homework in a dark, lonely lab as a teaching assistant in a computer engineering course his senior year. Grading each 20-page exam could consume 40 hours. Complex assignments, with thousands of lines of code, were “beyond one person’s capacity.” Rather than despair, Dabke, now a second-year doctoral student in applied and computational mathematics at Princeton University, invented an automated grading tool to not only lighten his load but also provide quick feedback for students and instructors. The system, dubbed AG350, currently is being field-tested in several large upper-level and graduate courses in engineering and computer science—so far with positive results.
Call it the revenge of the nerds. Like oysters forming lustrous pearls around irritating grains of sand, overburdened TAs and faculty are seeking succor in algorithms, artificial intelligence, and machine learning to automate the tedious job of grading paper-based problem sets, quizzes, and finals. Some have developed digital grading systems to cope with surging enrollments and the sheer volume of work in large computer science and other STEM courses. Others are simply responding to niche needs, such as quickly assessing homework and providing feedback to their first-year MATLAB students. Several systems have moved from campus to marketplace. Gradescope, an AI-assisted, rubric-based grading software created by two computer science Ph.D. students and their professor at the University of California–Berkeley back in 2014, was acquired by plagiarism-detection service Turnitin last year and now counts more than 13,000 instructors in over 500 universities among its users, including many engineering educators. The system also supports problem sets, projects, and exams in biology, chemistry, computer science, mathematics, physics, and economics.
In some ways, higher education in general—and engineering in particular—has been slow to automate the grading process. Standardized tests like the SAT and ACT embraced machine scoring ages ago as a way of ensuring fairness, consistency, and efficiency. Ohio relies on robo-graders to assess student essays on state tests. Even athletic competitions are embracing digital tools: An AI-powered “judging support system” developed by Japanese tech giant Fujitsu debuted at the 2019 Artistic Gymnastics World Championships in Germany this past October and is expected to facilitate at the 2020 Tokyo Olympics.
A 30 Percent Time-Saver
Mike Thompson, a professor of electrical and computer engineering and associate dean for undergraduate programs at Baylor University, is one of the early adopters. “I hate grading,” he explains, bemoaning the stacks of exams, heavy briefcase, and hours of shuffling pages to adjust points that the task traditionally entails. He’s also a techie, however. So when he serendipitously learned from an enthusiastic user about an AI-powered method to streamline grading that works with Blackboard and other course management systems, he figured he’d “give it a shot.” Thompson says that Gradescope not only has cut the time he spends on grading by about 30 percent, enabling him to return assignments the same week and freeing up time to interact with students, but also has spurred discussion among TAs about assessment and the rubrics they’re going to use. “It gives us a framework to help discuss our learning objectives,” notes Thompson. Because exams and homework are scanned into the system, Gradescope also proved useful for organizing his classes during multiple visits back home to attend to recurring family health problems last semester. Rebecca Reck, an assistant professor of mechanical engineering at Kettering University, is also a fan. After hearing a colleague mention using Gradescope during a parents’ meeting at ASEE’s Annual Conference in Salt Lake City last year, she headed to the Exhibit Hall to learn more about this potential work/life balancer and was offered a free subscription as the first in her university to sign up. “I figured, well I can’t lose,” says Reck, who tested the tool on her summer class and found it really did save time. “I’ve been sold ever since.” Even without using the AI feature, Reck estimates that Gradescope has freed up 30 percent of the hours she once spent marking problem sets—an experience that helped inform her paper on digital grading tools and other tech teaching assists presented at this year’s ASEE Annual Conference.
Even prototype online graders are proving their worth. John Board, an associate professor of electrical and computer engineering and associate chief information officer at Duke University, is field-testing former undergraduate TA Dabke’s system in his ECE350 Digital Systems course this semester and can attest to accelerated learning. Typically, he spends two hours per student grading assignments and more on the final projects of building their own computer from scratch. This year, because automated grading quickly flags whether a hardware program works or not, students could fix errors and start programming the next stage. Result: Students had three weeks instead of just one to “show off” the videogames and other “cool things” they’d programmed their machines to do—a tripling of presentation time. “Those two weeks are huge,” says Board. Another benefit, he adds, is that “testing is more thorough and consistent, and students know in advance how well they’re doing.”
In a relatively small, specialized class devoted to computer architecture and hardware, automated grading can be a nice bonus. For faculty and graduate teaching assistants in computer science and other STEM disciplines with enrollment booms, such digital assists “are real survival tools,” says Board.
The “extremely painful” experience of grading definitely spurred Gradescope’s creators, according to a 2016 Berkeley engineering school news story. Sergey Karayev, one of the company’s cofounders, remembers moving around a conference table with seven other teaching assistants in computer science professor Pieter Abbeel’s introductory artificial intelligence class, each marking a part of some 250 exams. That was in 2012. Today, he says, the class has nearly 800 students. Princeton Ph.D. student Dabke can relate. Having to manually grade the work of 50 or 60 students, he says, was “a disaster for actually teaching.”
Text recognition and Data Analytics
While most course-management systems include robo-grading options, they typically cover short, multiple-choice-style problems and have limited utility, Reck and others have found. By contrast, tools like Gradescope and Crowdmark, a collaborative online grading platform developed by University of Toronto mathematics professor James Colliander after he and other volunteers struggled to manually grade hundreds of 20-page exams for the 2011 Canadian Open Mathematics Challenge, harness text recognition, data analytics, and other technology to assess complex responses in computer science, engineering, and other disciplines. Instructors need not alter assignments—though some upfront work is required to scan and submit documents to be graded. The AI-assisted grading software then groups similar answers together, allowing instructors to mark the whole batch simultaneously using rubrics they selected and award partial or full credit along with feedback on mistakes. Once tallied, grades then can be downloaded for students to review online. There’s even a way to disable regrade requests.
One of Gradescope’s most useful features, Reck and others say, is the ability to adapt a rubric and points assigned to earlier questions midway through grading and quickly update everyone’s tally rather than having to go back and recalculate each separate exam. Another feature provides per-question and per-rubric statistics, equipping instructors with bar charts and histograms to help catalogue each student’s mistakes and modify what to cover in class and future assignments. Scanned exams also can thwart cheaters’ attempts to change answers and discourage students from badgering instructors to raise their score, adds Karayev. Engineering administrators may find a forthcoming Gradescope feature particularly helpful in ABET accreditation: an easy way to link each student’s answer on specific exam questions to required learning outcomes. Karayev says the idea for the feature first was broached by an instructor.
With colleges and universities under pressure to deliver better results while budgets shrink and class sizes grow, automation may seem an alluring alternative to teaching assistants. But don’t expect job-killing bionic graders in classrooms anytime soon. For starters, these systems are still a work in progress.
“It’s not all rainbows and unicorns,” says Duke’s Board. AG350 may be the only automated grader tackling computer architecture and hardware programming, but it remains under development and cannot yet provide immediate feedback and grades with the push of a button. Baylor’s Thompson initially had problems with papers getting crunched in the scanner—and one time the unstapled exams fell and scattered on the floor, sparking a frenzy of matching each page’s handwriting in an effort to reassemble them. Now equipped with Gradescope’s recommended scanner, the process has become “an almost Zen-like experience,” though he has students initial each page of their homework or tests and advises other instructors to do the same. Thompson also looks forward to investigating the use of tags to generate ABET-assessment data.
“The biggest caveat is there is no one tool that works for everyone,” says Kettering’s Reck. “Everybody’s approach to grading is as unique as the faculty and classes they teach.” While she has realized a 30 percent reduction in grading drudgery, a colleague reports scant difference. One reason, Reck surmises, is that her long, 20-point problems can more readily be broken into smaller chunks with short rubrics, putting everything on one screen and enabling the use of keyboard shortcuts. By contrast, complex problems with a single long rubric like her colleague’s don’t mesh as well with Gradescope’s format. Reck also found Gradescope’s interface for students wasn’t intuitive—perhaps because instructors designed it. She creates a fake account and logs on as Sally Student to guide her class on how to see feedback, which no longer takes the form of a check mark and scribbled comments on each problem. She also cautions instructors to bone up on federal privacy regulations, since they essentially are sharing protected student data with a third party.
While noting the imperfections, faculty members and TAs alike find that automated grading systems allow more time to interact with students. “If you have to grade 20 hours a week, that is 20 hours you don’t have to be available to students,” says Karayev, who received a Ph.D. in computer science from Berkeley in 2014, the same year Gradescope launched. “It should save you time but also should make you a better educator.” Baylor’s Thompson believes Gradescope and other tools have deepened interactions between professors and TAs by providing a “framework” for discussing learning objectives and other important teaching concepts. “It keeps me more engaged,” he says, noting that he now can meet his goal of turning work back to students in a week.
Meanwhile, former toilers in the grading trenches continue to solicit ideas, converse with users, and improve their labor-saving offerings. Karayev says Gradescope is working to make the system “simpler and simpler,” so that users have a smoother experience even though there’s greater power under the hood. Based on user feedback, the company has developed an early beta version of an application that will let instructors change the order of questions and still be able to grade them with AI assistance and shared rubrics, a daunting yet common manual task.
As for Dabke, when not immersed in his doctoral studies he spends most waking hours perfecting his AG350 grader. He formed a company in January and recently launched a website. His next task is to win over instructors and investors. His pitch: As the only automated grading system that not only focuses on computer architecture but also emphasizes accessibility for all users, AG350 has a unique engineering edge.
Mary Lord is deputy editor of Prism.
Design by Miguel Ventura