section one

Reliability and Validity

Reliability and validity are critical in ensuring results are dependable and measure what they are supposed to. Reliability refers to the consistency, precision and accuracy of a measurement instrument or research study. Whereas validity refers to whether a tool measures what it is intended to measure and whether the results are meaningful and applicable to the concept being studied.

1:03 Lead consultant, Ben Schwencke, explains reliability and how it differs from validity.

In the context of psychometric testing, reliability and validity are related, but are ultimately separate constructs. Put simply, reliability relates to the precision, accuracy, and replicability of psychometric test scores.

Validity however, answers the question “does this assessment actually measure the construct it claims to?”. As a result, reliability is required for validity, but not necessarily the other way around. For example, if a student completes a psychometric assessment 10 times, and gets the exact same score each time, the assessment can be said to show “reliability”.

However, more investigation is required to determine if the scores themselves are meaningful, and measure the psychological construct it purports to measure.

Reliability:

From a classical test theory perspective, there are two primary forms of reliability: Test-retest reliability, and internal consistency. Test-retest reliability involves giving the assessment to a group of participants two or more times, and evaluating the differences between each attempt. If the scores differ significantly between attempts, the test lacks reliability. If the tests scores are broadly similar (not necessarily identical), then the test can be said to be reliable. Internal consistency relates to the relationships between the individual items in the test and the overall score. With internally consistent tests, high scores on each specific item should correlate with a higher score on the assessment overall. Conversely, low scores on each specific item should correlate negatively with the score overall. This suggests that each question individually is measuring the same psychological construct, suggesting that the assessment is reliable.

If the tests scores are broadly similar (not necessarily identical), then the test can be said to be reliable.

Validity:

Validity relates to whether or not a psychometric assessment measures its intended psychological construct. Although validity requires reliability, as unreliable tests cannot measure anything at all, reliability does not guarantee validity. There are various forms of validity, which include:

  • Face Validity: Whether or not an assessment appears to measure the intended psychological construct.
  • Content Validity: Whether an assessment measures all aspects of a particular psychological construct.
  • Convergent Validity: Whether scores on an assessment correlate positively with another similar assessment designed to measure that same construct.
  • Divergent Validity: Whether scores on an assessment correlates negatively / not at all with another assessment designed to measure an unrelated construct.
  • Criterion-related Validity: Whether an assessment is able to predict real-world outcomes which are hypothesised to be associated with that specific construct i.e. job performance, training performance, employee retention etc.

To show that a test is “valid”, multiple forms of validation are required, especially for newer assessments and less established psychological constructs. As part of the R&D process for psychometric assessments, psychometricians conduct many studies investigating the reliability and validity of the assessment, presenting the findings in a technical manual or academic journal article.