Students’ course evaluations are a fact of academic life. Here’s how instructors can combat stereotypes and gender bias while improving their own effectiveness.
One engineering educator drew jeers for incomprehensible handwriting and accent, lame jokes, and a penchant for pop quizzes. Another was dismissed as “a pretty face.” Some acted so egregiously condescending, mean, or disorganized that they earned the ultimate snub: worst professor ever. “Run . . . run away,” a critic warned classmates.
Ouch. Loathe them or love them, student evaluations of teaching (SET) have become a required routine for most faculty. Introduced in the mid-1920s, regular ratings of instruction now span every discipline and course level, with results affecting decisions from TA office hours to promotion and tenure. According to a paper presented at ASEE’s Annual Conference this past June, some 94 percent of deans surveyed in 2010 reported their colleges “always used” systematic student reviews to judge faculty teaching performance, up from 88 percent a decade earlier. Meanwhile, the Internet abounds with unvarnished—at times brutal—appraisals, such as RateMyProfessors, with its “hotness” category, and student-developed sites like CalPoly’s PolyRatings.
Given their ubiquity and high-stakes consequences, it’s not surprising that SETs have sparked faculty resistance, debate, and scrutiny over the years. A 2003 National Academies report, “Evaluating and Improving Undergraduate Teaching in Science, Technology, Engineering, and Mathematics,” found them a valuable tool but cautioned against relying solely on student ratings to gauge teaching effectiveness, since scores tend to reflect class size and other variables unrelated to pedagogy. More recent studies have uncovered racial, cultural, and gender biases, prompting National Public Radio in 2014 to declare “Student Course Evaluations Get an ‘F’.” In the latest salvo, economist Anne Boring of Sciences Po University in Paris and coauthors Philip Stark and Kelli Ottoboni from the University of California, Berkeley, found that students gave male instructors higher ratings across the board—even though an analysis of anonymously graded finals show those taught by men did slightly worse, on average. That dovetails with previous studies revealing consistently lower scores for female science and engineering faculty compared with their male counterparts.
The implications for engineering—where women compose just 16.3 percent of faculty members and 20.9 percent of undergraduates—prompted an ASEE Annual Conference panel on gender bias and SET. It was informed by a review of the research by University of British Columbia biomechanical engineers Agnes d’Entremont and Hannah Gustafson. Their conclusion: Instructor and student gender may influence course evaluations, but the picture is complicated. Some studies measured no or small differences in scores for female and male professors. Others did not control for differences in teaching styles and effectiveness, making it difficult to determine gender bias. Murkiest of all, however, was the relationship between course evaluations and teaching effectiveness. “It is not clear that this is what SET solely (or even primarily) measures,” note the authors, citing one study that found similar learning gains for students in sections with the highest and lowest course scores. ASEE panelist Jason Bazylak, an associate professor of mechanical engineering at the University of Toronto, witnessed this disconnect after he caught a number of students plagiarizing—and subsequently got the worst student evaluations of his career.
Panelist Janet Callahan, chair and professor of Boise State University’s Micron School of Materials Science and Engineering, experienced the “classic story” of many young female faculty members. Her first class (at a different institution) was a large lecture course—a format that typically receives poorer ratings on average than small, upper-level classes. Moreover, statics was “somewhat outside” her field. Callahan wryly described winding up with “the lowest scores the department had ever seen,” but after teaching the course for three more semesters, she had mastered the material and learned the best approaches for engaging her students. “It’s important to be confident in your own knowledge of the course material,” she advises. “That imposter syndrome will rear up otherwise!”
Panelist Adrienne Minerick, associate dean for research and innovation at Michigan Technological University, had to deal with “some aggressive behavior” at the start of her academic career. She was young and petite, teaching her first 60-student class at her first institution, and trying to handle “a 250-pound guy standing over my desk, pounding on it, and pointing and yelling at me.” On their course evaluations, students complained she was “not nurturing or kind.” Minerick, a professor of chemical engineering and former ASEE Board of Directors member, took the feedback to heart and sought guidance from her colleagues. The male faculty members said they’d never encountered such behavior, while the women were dismissive. “They thought I was just complaining,” she explained during the ASEE panel discussion. Determined to improve, Minerick signed up for the ExCEEd Workshop, an intense, weeklong teaching boot camp for engineering faculty put on every summer by the American Society of Civil Engineers. There she learned that her organization and delivery were good but was advised to stand and carry herself in a more authoritative manner. That “attitudinal shift” buoyed her ratings, and six years later she made tenure.
Contrary to popular belief, students don’t favor instructors who dispense easy A’s or water down course content. Nor do they come, after graduation, to appreciate the rigor and quality of a professor they previously panned. It does happen, of course, but as North Carolina State University chemical engineering professor emeritus Richard Felder and education consultant Rebecca Brent noted in a 2008 column on SET, alumni ratings “correlate significantly” with their previous evaluations of “lousy” instructors. And while appearance, personality, gender, or race bias can factor into scores, many studies find the impact is small. (Heavy accents seem a larger problem, judging from online course ratings.) “The weight of the evidence is clear,” write Felder and Brent. “If students consistently say someone’s teaching is good or bad, they’re almost certainly right.”
Don’t teachers who are caring, enthusiastic, and fun get better reviews than aloof sages on the stage? Sure, but those traits also correlate with student motivation and learning.
However imperfect, student ratings of instruction do provide a valid dipstick on individual faculty performance over time if supplemented with peer reviews, portfolios of student work, or other measures. They also can offer valuable consumer feedback. For example, comments about TA inaccessibility might be addressed by adjusting office hours. “Saying that students can’t evaluate teaching is kind of like saying that diners can’t evaluate their meals,” Joe Grimm, the faculty lead on To My Professor, a 2016 advice book for instructors written by Michigan State University students, told Inside Higher Ed.
Here are suggestions for improving the process while addressing bias:
Boost response rates. Beware of ratings collected from fewer than 10 students or less than two thirds of a class, caution Felder and Brent, who also warn against making personnel decisions based on a single semester’s reviews. One or two disgruntled or biased students can tip the scales. To increase response rates, Vanderbilt University’s Center for Teaching recommends talking to students about the importance of course evaluations and how you have changed the course based on their feedback. In a 2003 interview on the subject, Kathleen Hoover-Dempsey, a professor emerita of psychology, described how she would schedule a time to fill out the evaluations and tell students in advance that she really wanted them all to be present. “I tell them that I read every comment and find the comments extremely useful in thinking about and improving my own teaching,” she explained, repeating those statements when she gives out the forms. “You can never write too much,” she would add, then follow the university’s guidelines and quickly leave the room after identifying who would collect and deliver the forms to the department office.
“If you can make this case convincingly, most students will take the ratings seriously and you should get a good rate of return,” write Felder and Brent. “If you can’t make the case, there is no reason the students should take the ratings seriously and you should not be surprised if they don’t.”
Evaluate teaching in multiple ways. Teaching portfolios, peer evaluations, and discussions with groups of students regarding the professor’s performance are among the tools d’Entremont and Gustafson suggest. This dovetails with the 2003 National Academies report’s recommendation to use multiple measures, such as tapping the unique perspectives of graduating seniors, alumni, and graduate teaching assistants. Such an approach can yield “powerful benefits” in student achievement while helping faculty improve their teaching, the study’s authors concluded, citing Virginia Tech’s creation of an alumni advisory board to debrief upperclassmen about the curriculum and recommend changes. The approach uncovered concerns about the need to learn and use the latest software, leading to the development of a new computer lab, while the need for communication skills encouraged faculty to introduce a new writing-intensive course in the civil engineering curriculum.
Schools that use student evaluations in assessing faculty members should also consider an instructor’s teaching workload and number of advisees. Look at the distribution of courses taught, suggests Callahan: Are your female faculty clustered in first-year and sophomore-level courses, which tend to have lower scores? Panelists also noted the potential impact of coteaching with a much better teacher, since most evaluations presume each course has a single instructor.
Add or delete questions to increase inclusiveness. At Michigan Tech, where departments and instructors can add questions to student evaluations, the university’s Diversity Council developed a set of exemplars on cultural sensitivity and class climate, then asked department chairs to use a couple on every course survey. Sample item: “My instructor treats all students with respect.” Since terms like enthusiasm, warmth, confidence, and voice tone are “filtered through existing biases about gender norms,” d’Entremont and Gustafson recommend separating or scrubbing survey questions on instructor expressiveness. They also suggest raising awareness of existing gender biases through workshops for professors or by having instructors engage with students about gender issues in the classroom. Based on personal experience, says Minerick, talking openly about gender bias can reduce its influence.
Ask students for informal input. Don’t wait for end-of-semester ratings to address problems. Catch a group of students after class, hold a midterm focus group, or pose a quick question about how the class is going. Getting midsemester feedback can be as simple as asking for a short list of one thing to keep doing, one thing to stop, and one thing to start, Steph Burt, a professor of English at Harvard, offered in a post about student course evaluations on the Philosophers’ Cocoon blog.
Be aware of biases. For d’Entremont, “one of the most striking aspects” of the literature was the gap in expectations for male and female instructors. One study, for example, showed that students rated women as being less available despite objective measures showing they spent more time with students in office hours than their male colleagues. Reminding students that they are providing professional feedback, she suggests, might reduce bias and prepare them to provide more useful evaluations.
Show you care. “Students love it when you’re on their side,” says Minerick, noting that advocacy and assistance can foster a growth mind-set. Coding student responses can help tease out trends in a cascade of information, suggests panel member Michael Johnson, an associate professor in the department of engineering technology and industrial distribution at Texas A&M University. Boise State’s engineering school uses “reflective memos” to help faculty “systematically think through how their course went,” says Callahan, who puts hers on the opening page of her course notes, so it’s the first thing she sees when she prepares to teach the class again. Sample questions: What changes did you make this year in response to feedback, what will you do differently next time, and how could you change this course to increase student engagement? “Teaching is dynamic,” underscores Callahan. “How we teach can and should change year to year.”
Above all, “don’t take it personally,” ASEE’s panelists concurred. While engineering’s low percentage of female faculty may reduce opportunities for students to learn from women, one paper posits that being “oddballs” in a male-dominated profession could translate into exceptions to stereotypes. Given the potential for compounding the effects of even small gender biases, administrators should exercise caution when using student course evaluations in promotion and tenure decisions. But as Callahan and other seasoned educators demonstrate, feedback loops and improvement are as vital to teaching as they are to engineering.
By Mary Lord
Mary Lord is deputy editor of Prism.
Design by Nicola Nittoli