The Evolution of Testing: Student Assessment Through the Ages

Students might dread it, but testing is a critical part of the higher education experience—essential exercises designed to show that all those hours of work, in and out of class, were well spent.

How students are evaluated has changed dramatically over the years, however and those changes have been driven by three big factors. First, the democratization of higher learning, from an elite opportunity to a near-universal experience. Second, a need to develop credentials for careers that make university or college a prerequisite. And, maybe more than anything else, the fast pace of technological change.

That means that the way university students are assessed in 2019 would look pretty foreign to a student in a medieval university, or even one a few decades ago. And as technology moves faster than ever before, those changes are going to come at us even more quickly in the years to come. Let’s take a moment to stop and take stock of where we’ve been, and where we’re going.

387 BCE: Plato’s Academy

With no classes, no homework, no tests and no credentialing, Plato’s Academy was both the forerunner to the modern university, and in many ways its opposite.

Most learning at the Academy was in the form of dialogues, rather than lectures—Plato or other senior members of the Academy posed philosophical problems for students to resolve via discussion. A dialogue with Plato might be intimidating, but on the plus side, no cramming required.

605 CE: Standardized testing begins

SATs, MCATs, LSATS, GMATs, GREs: standardized testing holds the key to countless academic and professional spheres, just like in first-century China, where the civil service was the gateway to riches and social advancement. Written exams proved whether you would be able to advance into leadership positions, and barriers were high without wealth or privilege.

But in the year 605, the first major standardized test to enter the profession was implemented across the lands of the unifying Sui dynasty—and soon after then, everyone from all walks of life could enter. The exam, which covered philosophy and mythology, was even anonymized by calligraphers rewriting candidates’ answers to make it fairer.

1088: Viva Voce

Starting with the University of Bologna in 1088, medieval European universities employed public oral examinations, referred to as viva voce (“with living voice”). They were often presented as challenge-and-defend debates between junior and senior students, and, if the students performed adequately, they’d take on the university’s masters. The debates could go on for hours, sometimes turning into scholarly pile-ons. Academic historian Victor Morgan has likened them to “academic bloodsports.” Of course, you will recognize the modern-day version of this in thesis defenses.

1702: The first written university exam

Scholars dispute the exact year, but the first modern written university exam is generally dated back to Trinity College, Cambridge, in the early 18th century, where it was introduced by college master Richard Bentley. As with any novelty, it had its detractors. American observer Charles Bristed wrote that the “pen-and-ink system” was “fairer to timid and different men,” but also that “the scratching of some hundred pens all about you makes one fearfully nervous.”

The advent of written exams also made examiners less forgiving of errors: in a viva voce examination, a minor slip-up may easily be missed or forgotten. In a written exam, every mistake is accounted for, and students’ performance can be more easily compared.

1900s: Standardized testing hits the West

Remember standardized testing? British technocrats took up that ancient Chinese idea in earnest in the Victorian era, using standardized tests to evaluate colonial administrators.

But the first standardized test in North America may date back to the first year of the last century, when the College Entrance Examination Board was created, administering identical tests in nine subjects throughout the United States. In the decades to follow, standardized tests for intelligence, vocational aptitude and admission to higher education of all sorts were developed.

Large-scale standardized testing has been criticized for encouraging memorization of fact rather than deep learning, and for failing to account for students’ diverse linguistic and cultural backgrounds, assuming that all students are coming to them with the knowledge and reference points of the dominant culture.

1970s: Profs get TA assistance

As class sizes grew larger and professors and instructors were tasked with ever-greater workloads, the use of teaching assistants to help with grading took off. The practice has long been commonplace, and can effectively reduce stress on professors, as well as provide valuable experiences for assistants. But there is always the risk that inexperienced TAs may be too hard, or too soft, on student work, or grade inconsistently.

1972: Scantron

Invented in the early 1970s, Scantron soon became ubiquitous. The basic technology that allows machines to accurately read standardized multiple-choice forms is called optical mark recognition, and dates to the 1930s, but Scantron brought it to the mass market.

While it made life easier for teachers, it came in to criticism for helping to change the nature of testing itself, especially encouraging fewer essays and short answer questions, replacing them with more multiple-choice queries rewarding rote memorization.

1997: Turnitin

When Turnitin was first introduced, it was marketed as a solution to the problem of plagiarized essays and other writing. The software compares submitted papers to databases containing a vast assortment of scholarly and popular works, to identify instances of plagiarism. It was subsequently criticized for problems with accuracy and seizing students’ intellectual property, incorporating submitted papers into its ever-growing databases.

1999: Essays graded by algorithm

Forget Scantron—now even essays can be graded by computer algorithms. (Sort of.) Robo-scoring software doesn’t actually “read” essays, but compares dozens of features—from sentence structure to paragraph flow—to examples of “good” writing.

Research has found that some robo-scoring programs evaluate some students more harshly, thanks to a focus on metrics like spelling and grammar. That means students whose first language isn’t English, or who have learning disabilities, are at a disadvantage they may not face if an actual professor were analyzing the quality of their thinking.

Which leads to the biggest criticism: they just aren’t very good at analyzing writing. MIT researcher Les Perelman has even developed a “Basic Automatic B.S. Essay Language (BABEL) Generator,” which creates nonsensical essays using keywords from assignment descriptions. Some of those essays have earned top scores from e-Rater, a system used to score Graduate Record Examinations, TOEFL tests, and other common exams.

2000s: Clickers

Physical clickers didn’t revolutionize the multiple-choice test, but they did make it a group activity. With a clicker, a professor can present a multiple-choice quiz to an entire class, who utilize individual clickers to “vote” on answers. The results are instantly calculated and can be presented in real-time.

Clickers are, however, notoriously easy to cheat with, allowing students to “take” a test in absentia by having another student use their clicker. They’re also one more thing students need to remember to bring to class (and stocked with fresh batteries)—perhaps another reason they never caught on in a major way for graded tests. Though their popularity wanes, they are still used in some classrooms today to create a more interactive learning environment.

2010s: Top Hat

The rise of cloud computing and the growing ubiquity of mobile devices have opened up new possibilities for testing. Top Hat, for example, allows professors to create complex online tests incorporating written, visual, multiple-choice, and other question types, and administer them directly on students’ own devices. Top Hat’s proprietary lock-out capabilities prevent students from cheating and exams can be auto-graded upon submission, notifying students with their results immediately.

The future

It’s possible that technology could help correct some of the problems of past testing methods. Artificial intelligence could tweak standardized tests for students of different backgrounds, for example. And right now, virtual reality is starting to be piloted for education in medicine and other fields: Imagine performing virtual surgery as part of medical-school testing, or walking around a virtual construction site to assess engineering problems.

As always, new technology will create new problems and new possibilities—but there’s no stopping it.

Tagged as:

Student Assessment

The Evolution of Testing: Student Assessment Through the Ages

387 BCE: Plato’s Academy

605 CE: Standardized testing begins

1088: Viva Voce

1702: The first written university exam

1900s: Standardized testing hits the West

1970s: Profs get TA assistance

1972: Scantron

1997: Turnitin

1999: Essays graded by algorithm

2000s: Clickers

2010s: Top Hat

The future

Tagged as:

Recommended Readings

34 Effective Student Engagement Strategies to Boost Learning in College Classrooms

Rate, Like, Subscribe: The Top 5 Higher Education Podcasts

The Evolution of Testing: Student Assessment Through the Ages

387 BCE: Plato’s Academy

605 CE: Standardized testing begins

1088: Viva Voce

1702: The first written university exam

1900s: Standardized testing hits the West

1970s: Profs get TA assistance

1972: Scantron

1997: Turnitin

1999: Essays graded by algorithm

2000s: Clickers

2010s: Top Hat

The future

Tagged as:

Recommended Readings

34 Effective Student Engagement Strategies to Boost Learning in College Classrooms

Rate, Like, Subscribe: The Top 5 Higher Education Podcasts

Subscribe to the Top Hat Blog