As you receive the results of your student evaluations at the end of the fall semester, you might be heartened—or disappointed, depending on your results—to hear that any suspicions you have about their accuracy are well-founded.

A study undertaken by University Hospital of Münster last year, published in Medical Education, revealed that emergency medicine students who were offered chocolate cookies during their class were more likely to offer glowing reviews of their instructors. The lecturers, the teaching styles and material did not change; but overall ratings were seven points higher (a mean of 224.5, instead of 217.2, on their scale of measurement).

If it’s too late to offer sugar-laden snacks to your students before they fill out their forms, there are three more reasons why student evaluations are usually not worth the pixels they are written in.

1. They’re heavily biased

Evaluations of instructors tend to heavily tilt depending on gender, according to a study entitled “Student evaluations of teaching (mostly) do not measure teaching effectiveness” published by ScienceOpen.

Researchers from Université Paris-Dauphine and the University of California found that student evaluations were biased against female instructors by statistically significant amounts, even for measurable attributes such as speed of grading. Male students in French universities tended to rate male teachers more highly; in American online courses, those higher ratings for male teachers came from female students.

University of Berkeley associate professor of history Brian Delay, talking about the study on Twitter, pointed out that gender biases can be large enough to cause more effective instructors to get lower scores than less effective instructors. For deans and course leaders, these differences should be kept in mind when making hiring or recruiting decisions, he adds.

2. They don’t reflect teaching quality

Philip Stark, professor of statistics at the University of Berkeley, has covered student evaluation metrics throughout his career. In an October 2016 paper, he writes: “The best evidence suggests that SET [student evaluations of teaching] are neither reliable nor valid, even when the survey response rate is nearly perfect.” Sources of bias include students’ grade expectations, the subject of the course material (math content tends to get lower ratings), the level of the course and whether the course is required.

The paper was one of those prepared as part of a Canadian arbitration case about the use of student evaluations for judging tenure between a Toronto-based institution and a union representing teaching academics. The arbitrator decided that the expert evidence “establishes, with little ambiguity, that a key tool in assessing teaching effectiveness is flawed.”

3. It’s very difficult to persuade all students to do them

A survey with perfect coverage doesn’t reflect teaching quality, as we see. But it appears many classes can’t even get as far as that.

Most student evaluations now occur online, explains a paper by Diane D. Chapman and Jeffrey A. Joines of NC State University, published in the International Journal of Teaching and Learning in Higher Education. And while they have the benefit of being anonymous and flexible, generally only 30 to 40 percent of students will complete them.

Low levels of feedback will also affect results, says Berkeley’s Philip Stark. “Anger is generally a stronger motivator than contentment, it is plausible that survey responses are strongly biased towards negative results when response rates are especially low. In short, there is no basis for extrapolating SET or student comments.”

What to do instead

“I think student evaluations should be made to help individual instructors learn how to be more effective,” Berkeley’s Brian Delay told Top Hat. “I’ve often gotten valuable feedback from students, feedback that has helped me see what works, what can be improved, and how I might teach material or organize classes differently in the future. So I certainly don’t think they should be removed entirely. We have a lot to learn from our students.

“That said, I don’t believe that schools should be using data from evaluations to make decisions about hiring, promotion, salary, or tenure. It is very clear at this point that evals disadvantage women and people of color, so it isn’t acceptable to treat them like objective metrics.”

“We need to assess teaching, and we often have to rely on not the best, but the least worst, option,” argues Kevin Gannon, associate professor of history and director of the Center for Teaching and Learning at Grand View University, in the Chronicle of Higher Education. Student evaluations can be used to spot trends, rather than taken individually and seriously, and data should always be taken in context. (Was the instructor parachuted in at the last moment, for instance?) Gannon agrees with Delay: the best way to interpret evaluation is as supplemental material. No career should hang on a student evaluation.

In that case, there have to be other ways of checking teaching quality to use alongside student evaluation. One idea is to pre-empt the online evaluation by asking specific questions in class, as this professor does. It helps to set a correct context for the evaluation:

Colorado University, on the other hand, uses student evaluation as a single part of a wider Teaching Quality Framework (available here)—the system is opt-in, and is “multidimensional,” incorporating peer review and self-assessment in equal weight with student feedback.

But if you are reading back your assessments and wondering if your students were even taking the same class you taught, you can take comfort that the bad review, statistically, has nothing to do with you. Ultimately, human bias and unreliability shouldn’t be the sole arbiter of the quality of your teaching and your career, although questionable student evaluations are easier to carry out and not going away any time soon. And that’s just the way the cookie crumbles.