Assessments and Big Data: Can We Predict Better Hires?

May 18th, 2017
Jon-Mark Sabel
Assessments

The amount of data in the world has increased exponentially. It is estimated that 90% of the world’s data has been created in the last two years. Every online interaction, every smartphone input, every consumer purchase - all contribute to datasets of unprecedented size, or “Big Data.”

Each year, the Society for Industrial and Organizational Psychology (SIOP) holds a conference where the latest trends and advancements in employment assessments are presented and discussed. In the panel: “Making Better Business Decisions: Risk and Reward in Big Data,” industrial-organizational (I-O) psychologists on the bleeding edge of assessment science discussed the implications of these massive new datasets for predicting job performance.

These were the panelists:

  • Richard Guzzo, Consultant at Mercer
  • Nathan Mondragon, Chief I-O Psychologist at HireVue
  • Charles MacLane, retired, formerly at OPM
  • Richard Tonowski, representing the U.S. Equal Employment Opportunity Commission

For Industrial-Organizational (I-O) Psychology and pre-employment assessments, Big Data holds huge promise. 

Big Potential from Big Data

Consider the classic pre-hire assessment: 200+ questions that to the test taker seem repetitive and redundant. But to the assessment creator, 200+ responses are the bare minimum required to make an accurate prediction about future performance. Each new data point decreases the likelihood of a mistake and increases the predictive power of the assessment.

According to Richard Guzzo, the panelist from Mercer, these are the “classical shackles” of solving problems and making changes. The predictive modelling approach is there - it has been for decades. But if there is not enough information to be considered by the model, it cannot be predictive.

In other words, larger datasets mean more predictive accuracy. You can see how the millions of data points gathered from new data sources and collection methods could be immensely rewarding when performing predictive analysis.

Finance, marketing, and other industries are already leveraging these huge datasets to make more predictive decisions. So with data collection at historically unprecedented levels, is it time to change the pre-employment assessment game?

Gathering the Data

As mentioned above, traditional data collection relied on the written responses of job candidates. Now, with the increased ability to make sense of “big” data, there are three primary data collection methods being used by assessment innovators:

  1. Social & Web Scraping. Assessment data is gathered from candidates’ social media pages, online profiles (LinkedIn, Facebook, etc) and other online interactions.
  2. Game data (Gamification). Assessment data is gathered from candidates’ input in games designed to elicit responses that indicate cognitive and other job-specific skills.
  3. Video Interviewing. Assessment data is gathered from video interviews: non-verbal communication, word choice, intonation, etc. Altogether, each interview provides over 25,000 data points.

Each of these methods makes use of new data collection techniques, like machine learning, to make sense of what would be otherwise jumbled and nonsensical data.

Analyzing the Data

While the content of the data and the way it is extracted have changed, the I-O approach has not. The type of statistical test used to crunch the data and create a predictive output is the same as is used when creating assessment questionnaires.

Nathan Mondragon clarified: “There are definitely different names, and different methods of extracting the data, but not different statistics.”

A Bit of Background: Validity and R-Values

Validity, put simply, is the applicability of an assessment to actual job performance. There are three main types of validity:

  1. Criterion-related validity: indicates the correlation between test performance and job performance. A test with high criterion-related validity will accurately identify top performers.
  2. Content-related validity: indicates the relevance of test data to actions performed on the job. A test with high content-related validity will accurately measure specific job skills.
  3. Construct-related validity: indicates the test measures the characteristic it claims to measure. A test with high construct-related validity will accurately identify and measure certain abstract traits, like aptitudes.

There is often overlap between these three types of validity in a single assessment, but it is not guaranteed. R-values represent how valid the test is. Here’s a table showing what you can expect from tests at each given r-value:

rvalues4.jpg

For example, an assessment with a criterion-related validity of .35 is going to consistently identify top performers. R-values above .4 are largely unheard of.

So What’s the Value Added by Big Data? And Can It Be Used in All Situations?

According to Richard Guzzo, the value of Big Data lies in the fact that it can be used in all situations.“We’re not just dealing with a candidate’s responses to a written questionnaire anymore,” Guzzo explained. “The huge variety of data lets us apply I-O techniques to a range of issues at a rapid pace.”

Nathan Mondragon examined similar implications: “Big Data can make the assessment cheaper, better, and a better candidate experience,” he said. Reflecting on the 25,000 data points collected from a video interview, Mondragon asserted that we can do away with tests - lowering costs and concerns while making a better experience for the candidate.

“With our validated, I-O designed video interview questions, we get a .3 - .4 (r-value),” he explained. “When the video interview is designed correctly and considered correctly with I-O techniques, there are good reasons to reconsider long-standing methods.”

“There are definitely improvements to using biggish data,” Charles MacLane elaborated. “We might have been missing a critical piece of data in consideration or excluding a group based on a piece of data not being considered because of volume limitations.”

The ability to crunch larger datasets can make pre-employment assessments more agile, predictive, and convenient.

But what about the legal side of things? Richard Tonowski, the panelist representing the US Equal Employment Opportunity Commission, reflected on the potential for adverse impact when considering new sources of data.  

The EEOC Perspective

According to Richard Tonowski, the EEOC does yet not have an official position on Big Data.

“At an open commission meeting last year, my perception was that the commission was open but suspicious,” Tonowski explained. “We have to get straight what we are talking about - what does a Big Data-driven assessment provide an alternative to?”

Potential Legal Issues

He questioned why an employer would perform a huge study using new methods if they could get an r-value of .3 with a traditional "off-the-shelf" assessment. As companies continue to gather more and more data, he sees problems arise when they attempt to put together “Big Data” assessments themselves:

“In order to defend a selection system you need to have good construct, content, and criterion - in many new cases this might be missing,” Tonowski elaborated. “If you create an adverse impact when using a different methodology, it doesn’t matter what interesting interaction effects you have. The courts will jump on that.”

Nathan Mondragon agreed. “Like with any traditional assessment, if you have bad predictors or criteria, you still have bad validity. You still need to get good, job-related data to get good validity.”

For example, video interview data with standard interview questions gets pretty good results (around a .2). But if specific questions are created based on KSAs and KPIs, validity shoots up to a .4 (r-value) - without adverse impact.

“If we draw upon standard interview questions, we get decent results from video interview data. But when we look at KSA’s and KPIs and create questions based on good job analysis and good job design, we’ll go from a .2 to a .4 based on the better questions.” - Nathan Mondragon, HireVue

"What Nathan just said (regarding the use of good, job-related data) we would consider a best practice," Tonowski responded.

Essentially, if a new assessment is not fully vetted and discriminates against certain demographics, it doesn’t matter how well it predicts job performance. But with the proper application of I-O techniques, this shouldn't be an issue.

What about social & web scraping?

In regard to social scraping, Tonowski’s advice was to “tread lightly.” In some situations, like screening for suitability or cultural fit, it might not be an EEO issue. And if you’re using it to assess qualifications, you might be on shaky legal ground.

But if it is used as a mass screen out up front, like in the instance of criminal conviction history, it’s an issue because it causes adverse impact.

Big Data: Risk, Reward, or Both?

The predictive power of massive datasets holds huge promise for a field that traditionally required lengthy, time-consuming questionnaires to make predictions.

But unlike the finance and marketing industry, talent acquisition has a duty to ensure that the results of their analysis have no adverse impact on protected groups. Understanding why and how an algorithm arrives at its conclusions is necessary to making changes that eliminate adverse impact.

So might there be risk to using “Big Data” when making screening decisions? Perhaps. But the consensus seems to be that, with the proper application of I-O techniques, Big Data can be leveraged for greater predictive power with none of the risk.

Learn how Big Data and AI can make the assessment a great experience for the candidate.

Learn How To Prevent Candidate Drop Out With A Great Assessment Experience

See why five category-leading companies are innovating away from their traditional assessment in our newest eBook: Stop Sacrificing Candidate Experience for Assessments.

The above is coverage of the SIOP 2017 panel: “Making Better Business Decisions: Risk and Reward in Big Data.” Due to the absence of recording equipment, quotes attributed to individuals in the panel are not exact.