Skip to Content

The most effective hiring methods according to 100 years of studies

Written by
Ben Schwencke
Updated
decorative gradient bars

For 100 years, personnel psychologists have been running studies to find out which selection methods actually predict who will perform well on the job. The findings are consistent but often ignored in favour of familiar methods like CV sifting and gut-feel.

We will present what the evidence shows and break it down into simple terms.

First it's important to understand how researchers measure whether a hiring method works

The key metric in selection research is predictive validity. This is a correlation coefficient (r) that measures how reliably a hiring method predicts future job performance and is displayed as a decimal figure from 0 to 1 (e.g. r = .38).

A score of 0 means the method has absolutely no correlation with job performance at all. Whereas, a score of 1 would mean perfect prediction (think "crystal ball" level insights).

In practice, r = .50 or above is considered excellent, and most selection methods fall below that level.

Researchers measure validity by administering a selection method to candidates, hiring some of them, and then tracking how those hired candidates actually perform on the job. The limitation is that you can only measure the people who got hired. If a company consistently hires higher-scoring candidates, you're only seeing part of the picture, which makes the true relationship between test scores and job performance harder to detect.

Researchers correct for this using statistical adjustments for "restriction of range" and "measurement error". The adjusted figures are called operational validities, and those are what you'll see in the tables below. They represent what the validity would look like across the full applicant pool, not just the ones who made it through.

The two biggest studies cover 100 years worth of selection data

The most cited paper in the field of personnel selection (and frequently quoted by us), is Schmidt and Hunter's meta-analysis in 1998. It synthesised 85 years worth of selection research across 19 methods. It was then updated by Schmidt, Oh and Schaffer in 2016 to cover 100 years of data across 31 methods. Both studies came to broadly similar conclusions about the effectiveness of different hiring methods.

The table below shows zero-order operational validity (meaning the effectiveness of each method on its own) for predicting job performance, where a higher number means stronger predictive power.

Selection methodSchmidt & Hunter (1998)Schmidt, Oh & Shaffer (2016)
General mental ability (GMA).51.65
Work sample tests.54.33
Integrity tests.41.46
Structured interviews.51.58
Peer ratings.49.49
Job knowledge tests.48.48
Job tryout procedure (internships).44.44
Unstructured interviews*.38.58
Biographical data.35.35
Assessment centres.37.36
Conscientiousness tests.31.22
Reference checks.26.26
Job experience (years).18.16

*The 2016 figure for unstructured interviews is often contested. See the structured interviews section below for more info.

The biggest shift between studies is GMA, which rose from .51 to .65. This is due to the 2016 research using an improved statistical method that corrected more accurately for the narrow-population problem described earlier, revealing that the 1998 figure had been underestimating the true relationship between GMA and job performance. The same improvement also raised the structured interview figure from .51 to .58.

Work sample tests moved in the opposite direction, falling from .54 to .33. The 2016 authors attribute this to a larger pool of primary studies including more service-sector roles, where hands-on simulations are harder to design and less predictive than in manual skilled trades (Roth, Bobko & McFarland, 2005). So the predictiveness of work sample tests vary depending on the sector and type of work.

What Schmidt, Oh, and Schaffer concluded from their 2016 paper

It's worth reading the direct findings from the Schmidt, Oh, and Shaffer 2016 paper itself as it summarises things well:

Employers must make hiring decisions; they have no choice about that. But they can choose which methods to use in making those decisions. The research evidence... shows that different methods and combinations of methods have very different validities for predicting future job performance. Some, such as person-job fit, person-organisation fit, and amount of education, have low validity... others, such as GMA tests and integrity tests, have high validity.

Of the combinations examined, two stand out as being both practical and high in composite validity: the combination of a GMA test and an integrity test (composite validity of .78); and the combination of a GMA test and a structured interview (composite validity of .76).

Both of these combinations can be used with applicants with no previous experience on the job (entry level applicants), as well as with experienced applicants. Both combinations predict performance in job training programs quite well (.78 and .72, respectively), as well as performance on the job. And both combinations are less expensive to use than many other combinations. Hence, both are excellent choices.

The validity of the personnel measure (or combination of measures) used in hiring is directly proportional to the practical value of the method—whether measured in dollar value of increased output or percentage increase in output. In economic terms, the gains from increasing the validity of hiring methods can amount over time to literally millions of dollars. However, this can be viewed from the opposite point of view: By using selection methods with low validity, an organization can lose millions of dollars in reduced production, reducing revenue and profits.”

What makes this conclusion particularly striking is that it mirrors almost word-for-word what Schmidt and Hunter wrote in 1998. Nearly two decades of additional research, improved statistical methods, and 12 more selection methods in the dataset - and the direction of the findings didn't change. The numbers got stronger, but the same two combinations sat at the top. That kind of consistency across 100 years of research is about as close to a settled answer as personnel psychology gets.

General mental ability combined with either an integrity test or structured interview produce the highest validity

General mental ability

General mental ability (GMA) sits at the top of both studies. GMA is what cognitive ability assessments measure: the capacity to learn quickly, reason through problems, and acquire job knowledge. It predicts performance across virtually every role and seniority level as people with higher cognitive ability acquire job knowledge faster and apply it more effectively (Schmidt & Hunter, 2004). Unlike job experience, which plateaus after around five years, cognitive ability continues to predict performance throughout a person's career.

Cognitive ability assessments are also the lowest-cost method per insight, as a test can be administered to every applicant online in minutes.

Pairing a cognitive ability assessment with an integrity test reaches a validity of .78, and combined with a structured interview reaches .76. These are the two strongest two-method combinations in the research, by a considerable margin.

Integrity tests

You might be wondering what an "integrity test" even is. Originally developed to identify candidates likely to engage in dishonest and counterproductive workplace behaviours, such as theft, fraud, or absenteeism integrity tests have been shown to predict overall job performance reliably (Ones, Viswesvaran & Schmidt, 1993). They come in two main forms: standalone integrity tests (direct questions about honesty) and personality-based tests (covertly assessing traits related to rule-following).

It's worth noting that a standalone integrity tests are rarely used as candidates can often see exactly what's being measured, which can limit the value of the data. What the research actually shows is that integrity tests derive much of their predictive power from the personality traits they tap into. Schmidt, Oh and Shaffer (2016) note that integrity tests measure, in part, conscientiousness, agreeableness, and emotional stability. In practice, this means that well-designed personality questionnaires covering these traits are the more common and more useful tool.

testpartnership logomark Pro Tip

A good personality assessment that includes questions across conscientiousness, agreeableness, emotional stability, and integrity provides a more complete behavioural profile of a candidate rather than a narrow analysis of their ethics in isolation.

Structured interviews

The staple and holy-grail of all hiring processes. Interviews are TA teams' favourite method due to the interpersonal skills at play, and the research supports them as a strong and predictive tool.

They're usable at any seniority level, with any type of candidate, and produce comparable data across applicants. The 2016 data puts their combined validity with a cognitive ability assessment at .76, making this pairing highly practical for most organisations and is the recommended combination for our optimal early-careers hiring strategy.

Note on unstructured interviews: the 2016 paper reported their validity rising sharply to match structured interviews at .58, however commentators have found they overcorrected for range restriction. The weight of evidence still firmly favours structured interviews, for more information read our article on structured vs unstructured interviews.

Several commonly used hiring methods scored low validity and here's why

The bottom half of the selection methods table featured lots of selection methods that hiring teams show a resistance to let go of. They often persist because they methods feel familiar, inexpensive or intuitively meaningful - either not knowing the predictive data behind them, or choosing to ignore it.

Years of job experience (r = .16)

The reason for this is experience tells you how long someone has been exposed to a job, not how well they performed in it. Research shows performance gains from experience plateau after roughly five years, after which additional time in role produces little increase in job knowledge or output.

For experienced hires, job knowledge tests (r = .48) measure the actual outcome of experience far more accurately than years in role.

Assessment centres (r = .36)

When used on their own, assessment centres have a relatively strong validity score of ".36". The problem is, when combined with a GMA in your hiring process, the incremental validity is ".01". This means that the addition of an assessment centre when your hiring process already contains a cognitive ability test results in only a 2% increase in validity. The reason is assessment centres typically include a measure of GMA, so they add very little extra information that the previous cognitive ability assessment would have told you.

This is important as assessment centres come with a huge cost, which is often accepted as they look like they add a lot to the hiring process, but you're largely capturing the same data a cognitive ability test would tell you at a far, far higher cost.

Reference checks (r = .26)

This figure for reference checks is likely optimistic as the studies underpinning it were conducted before the effects of more recent legal changes.

Now employers must ensure references are not misleading, often leading to bare references covering only dates and job titles to avoid liability for defamation, negligence, or discrimination under the Equality Act 2010.

A reference that only confirms someone worked somewhere between two dates tells you almost nothing about how they performed.

So should I just use the hiring methods with the highest validity?

It's a reasonable question, but picking the top three or four methods from the table and simply using one after the other isn't the right approach.

Each method performs best in a specific position within the hiring process, and not all of them suit every organisation's context, volume, or resource constraints.

  1. Work sample tests can only be used with candidates who already know the job, which immediately limits their use to experienced hires.
  2. Job tryout procedures, which involve hiring people on a probationary basis and observing performance, have reasonable validity but are expensive to implement and wildly impractical at scale.
  3. Peer ratings, despite a validity of .49, can only be used for internal decisions like promotions, not external hiring.
  4. Structured interviews are highly valid but require human time. At high applicant volumes, running structured interviews before shortlisting simply isn't feasible so they have to be implemented later in the process when numbers are low.
  5. Cognitive ability assessments are the exception. They combine high validity with near-unlimited scalability. Every applicant can be evaluated against the same benchmark, quickly and consistently, without committing too much of your time.

This makes cognitive ability tests the natural anchor for the early stage of any high-volume process. Your large number of applicants can be screened down to a manageable shortlist for you to interview.

Building a strong hiring process means using each method where it fits best. Aligning with the researcher's advice, cognitive ability (and personality) assessments early, followed by structured interviews. That sequencing, supported by 100 years of research, produces validy of over .76 - increasing the chance of more successful hires.

Conclusion and next steps

A century of research, across tens of thousands of studies and millions of employees, arrives at consistent conclusions. Cognitive ability is the strongest single predictor of job performance. Personality assessments and structured interviews are its most effective complements. Years of experience and CV credentials are weak predictors that organisations lean on more from habit than from evidence.

We offer both cognitive ability testing and personality assessments, allowing you to adopt the strongest suggested combination into your hiring process. At Test Partnership we've helped countless hiring teams adopt this approach, and what tends to change isn't just the quality of who they're hiring. It's the confidence they have in those decisions, and how much faster they can move through the process when they're not second-guessing every shortlist.

If you're interested in what ability testing could look like for your team specifically, here's how our assessments work, or get in touch and we can talk through what would fit.

author profile ben schwencke
Primary author

Ben Schwencke

Chief psychologist at Test Partnership. MSc in Organisational Psychology with over ten years experience in psychometric testing.