Is Your Assessment 50 Years Old? A Brief History of Assessments

May 24th, 2017

Jon-Mark Sabel

At over 100 questions in length, most job candidates dread the pre-hire assessment. For one, it’s a huge timesink. At 100+ questions, the assessment requires that candidates allocate a substantial portion of their day to a single test. Most online tests are also poorly optimized. If a candidate loses their internet connection or accidentally refreshes a page, they are often forced to restart the assessment from the very beginning. Those that actually complete the assessment are left wondering why they were asked the same question, slightly rephrased, over and over again. Despite these negatives, the number of employers using assessments as a pre-hire screening tool is increasing: according to Talent Board’s 2016 Candidate Experience Research Report, over 75% of companies report using assessments to make screening decisions. But did you know that most “modern” online assessments are actually over 50 years old? Legacy Applicant Tracking Systems are often ridiculed for being "outdated" - and these are (at most) 15-20 years old. Why do we give a pass to one of the most important indicators of job success? To understand why we’re still using tests from the 1970s (and how to improve them), we need to understand the history of assessments.

From Natural Selection to Job Selection

Charles Darwin and Alfred Wallace's discoveries in the natural world snowballed into other disciplines, psychology included.

Charles Darwin

If nature could select for the best traits, why couldn't psychologists do the same for employers?

With this more functional orientation came the study of statistical techniques that are still in use today. Of particular note is the correlation coefficient, a measure of how well test performance correlates to actual job performance.

A Science in its Infancy: The Science of Selection

At the turn of the century, psychology with a focus on industrial application took its place as a distinct field of study. With it came a shift in emphasis: rather than measuring (irrelevant) performance attributes, like handwriting; industrial-organizational (I-O) psychologists measured cognitive attributes, like personality and intelligence.

World War I

World War I brought the “science of selection” into the mainstream.

World War I Assessment Robert Yerkes, president of the American Psychological Association (APA), famously implemented group intelligence tests on groups of army recruits. These tests were labeled Army Alpha and Army Beta: Alpha for literate personnel (featured to the right), Beta for illiterate.

While most members of the Army’s chain of command were not impressed with the results, Robert Yerkes and other practitioners labelled them a huge success.

The perceived success of these group intelligence tests took hold in the mainstream and set off an assessment boom.

Boom to Bust: Snake Oil Assessments

The wide-sung success of the Army Alpha and Beta tests brought a huge number of assessment providers into the private sector.

Unregulated and uncontrolled, most of the assessments offered by these new providers used remarkably unscientific methods for determining top performers. Palm readings, head size, and handwriting skill tests all found mainstream use in the private sector. Unsurprisingly, these snake oil assessments rarely offered a good return on investment. To salvage the reputation of science-based assessments, the APA publicly conducted studies debunking the usefulness of palm reading and other esoteric tests for making performance predictions.

These APA-led studies also established two assessment evaluation metrics that formed the cornerstone of the job assessment: reliability and validity.

Reliability evaluates the consistency of the test. If the single individual takes the same assessment multiple times, they should receive a similar score each time.
Validity evaluates the relevance of a test to what it measures. For example, a typewriting test is highly relevant when predicting the performance of a stenographer.

Establishing reliability and validity in an assessment required a huge dataset. 100 multiple-choice questions were often the minimum required to create an accurate picture of a test-taker’s cognition or personality.

While not as convenient as palm reading or measuring head size, these lengthy questionnaires actually predicted future performance.

The "science of prediction" established during this time is a large reason most assessments are decades old: the statistics, science, and "rules" of prediction haven't changed - so why should the assessment?

Unfortunately, these empirical studies were too little too late. The reputation of pre-hire assessments had already been tarnished by the inconsistency of their unregulated counterparts.

Coupled with rampant unemployment during the Great Depression, pre-hire tests would not regain widespread use until after World War II.

World War II

World War II brought assessments back into the public eye:

The Wonderlic (1936), the first short-form cognitive abilities test, was created by E.F. Wonderlic and gained fame helping the Navy select candidates for pilot training and navigation.
The Cardall Test of Practical Judgment (1942), the first situational judgment test, was created by Alfred Cardall to measure "problem-solving ability in everyday life."
The Briggs Myers Type Handbook (1944), the precursor to the Myers-Briggs test, was created by mother-daughter duo Katharine Briggs and Isabel Myers to evaluate personality.

The renewed focus on personality and cognitive testing, coupled with the covert operations needs of the Office of Strategic Services (OSS, the precursor to the CIA) led to the development of the physical assessment center.

Old Fashioned Practical Assessment

The tests conducted in OSS-run assessment centers ranged from mundane paper tests to more extravagant tests of skill. A notable situational test saw candidates supervise and instruct two uncooperative privates as they built a miniature house. At the close of the war, most assessment centers remained open, shifting their focus to industrial applications.

Post WWII: Re-Entry to the Mainstream

Following World War II, assessment centers and multiple-choice paper assessments found widespread use in the private sector. These assessments followed the precedent set by their WWII-era predecessors, and focused on measuring personality and cognitive traits as predictors of job performance.

It is estimated that by the mid-1950s, 40-75% of companies used paper tests to evaluate job candidates.

The Civil Rights Act

With the passage of the Civil Rights Act of 1964, the stage was set for a new focus in the assessment space: the removal of bias from the selection process.

Griggs Vs Duke Power 1971

On closer scrutiny, it became apparent that some of the most popular pre-hire assessments unfairly discriminated based on race and gender. In the case Griggs vs. Duke Power (1971), the Supreme Court established specific rules for job selection tests:

Selection tests must be job related and based on the qualifications of the specific job.
If adverse impact occurs (ie, if test results score protected groups lower), the defendant must show that the test is job related.

Prior to the Civil Rights Act and Griggs vs. Duke Power, it was common for organizations to administer generic cognitive tests when making selection decisions. As these were not job-specific (and possessed a tendency to adversely impact certain groups), they were replaced by assessments that more specifically measured job skills.

Paper assessments developed during this time are still prevalent today. All that has changed from the 1970s to the present is the method of delivery.

1970s - 1990s: From Paper to Phone

The period between the 1970s and the 90s saw many assessment centers go paperless: transitioning from paper to phone assessments.

1996 - 2010: Into the Internet

With the creation of the world wide web, the stage was set for a new era of internet-distributed assessments: in 1996, Nathan Mondragon built the first assessment ever delivered online for Coopers & Lybrand.

Modern Pre Employment Assessment

Since candidates could complete them from any internet-connected device, online assessments eliminated the need to visit a local assessment center or telephone into a call center. Like their paper and telephone precursors, these assessments make use of large question sets to establish predictive validity. Unlike their paper and telephone predecessors, they still find wide use as a pre-employment screening tool.

If It Ain't Broke - Why Change?

Assessment science has been through a lot. Having undergone decades of testing, validation, and legal affirmation, the science is incredibly well established.

Your "modern" assessessment is probably 50 years old because it still works - technically. Paper assessments created following the Civil Rights Act are still reliable, valid, and predictive: the science and statistics have not changed.

What has changed are the expectations of the candidate.

Today's Candidates Have Options - And They Don't Prioritize 300+ Questions

You're no longer competing with the local manufacturing plant for a small and hyper-local talent pool. Today's candidate has access to every relevant job opportunity in the world (quite literally). Asking them to set aside two or three hours that could be spent interviewing or applying for other jobs just isn't realistic.

Put a different way: lengthy talent assessments disproportionately favor candidates that have no other options (or those that try to cheat). High quality candidates with multiple opportunities either:

Won't bother attempting (or completing) it.
Will prioritize other job hunting activities until they forget about it.

The standard online assessment screens out both the worst and best candidates: the worst, because the assessment works; the best, because they don't bother completing it.

A Needed Transformation

Fortunately we're starting to see a shift in the pre-hire assessment marketplace, and AI is making waves in recruiting. Recent innovations in data science and computing have opened up the traditional, 100+ question assessment to transformation.

We're seeing that the 50 years of scientific rigor and validation of traditional assessments can be applied to non-traditional data, like recorded interviews and games.

In our newest eBook, we examine why four category-leading organizations in four diverse industries are leveraging these new methods to transform their assessment.

With recent innovations in artificial intelligence (AI), you don't need to risk the best candidates dropping out of your hiring funnel.

See the four biggest reasons you should rethink your assessment strategy.