section one

What is Classical Test Theory?

Lead consultant at Test Partnership, Ben Schwencke, explains classical test theory.

1:37 Quickly understand classical test theory.

Classical Test Theory (CTT) is the underlying theoretical framework that underpins conventional psychometric testing. The broad objective of CTT is to ensure reliability, precision, and accuracy of psychometric test scores by minimising error. CTT is best exemplified by the following formula:

Observed score (X) = True Score (T) + Error (E)

For example, if a candidate completes a numerical reasoning test, and scores 16 / 20, their “Observed score” is 16. However, no psychometric assessment is 100% reliable, as error always influences the result, meaning this candidate’s observed score will differ from their “True score”. This true score is the candidate’s true level of numerical reasoning, which is unknowable from a CTT perspective. The magnitude of difference between the observed score and the true score is determined by the level of error associated with that assessment, with unreliable assessments showing greater levels of error.

The goal of CTT based assessments therefore, is to minimise the error component, ensuring maximum congruence between the observed score and the true score.

Under CTT, error is estimated using reliability coefficients, particularly test-retest reliability and internal consistency. The most commonly used estimate of internal consistency is the famous “Cronbach’s Alpha” statistic, which ranges from 0-1, with scores of .7 or above generally indicating a sufficient level of reliability. Higher levels of reliability generally indicate lower levels of error, and thus greater congruence between the true score and the observed score. Low levels of reliability however, show greater levels of error, meaning the observed score is likely to differ significantly from the true score, making the results invalid.

Increasingly, CTT is being replaced by the more complex Item Response Theory (IRT), or modern psychometric test theory. Although CTT works well when assessments utilise a uniform set of questions, CTT is very limited when creating item-banked assessments. Because CTT posits that only two factors influence a person’s observed score i.e. their true score and error, CTT cannot account for differences in question difficulty, item discrimination, and guessing, all of which require parametrisation in item banked assessments. To account for this additional complexity, IRT factors in these parameters into the observed score, freeing assessments from requiring fixed-forms.