Latent variable models formalize the idea that the constructs psychologists study — intelligence, anxiety, attitudes, personality traits — are not directly observable but must be inferred from observed indicators such as test items, questionnaire responses, or behavioral measures. By explicitly modeling the relationship between latent constructs and their indicators, these models separate true construct variance from measurement error and enable the estimation of structural relationships among constructs that are free from attenuation due to unreliability.
The General Model
y = Λ_y η + ε (endogenous indicators)
Structural model: η = Bη + Γξ + ζ
where ξ = exogenous latent variables, η = endogenous latent variables
B = matrix of effects among endogenous variables
Γ = matrix of effects from exogenous to endogenous variables
ζ = structural disturbances
The measurement model specifies how latent variables relate to their observed indicators through factor loadings (Λ) and measurement errors (δ, ε). The structural model specifies causal or predictive relationships among the latent variables themselves. This two-part structure is the defining feature of structural equation modeling: it allows researchers to test substantive theories about relationships between constructs while simultaneously accounting for the imperfection of their measures.
Identification and Estimation
Model identification — ensuring that parameters can be uniquely estimated from the observed covariance matrix — is a fundamental concern. For each latent variable, the scale must be set (typically by fixing one loading to 1.0). The model must have at least as many known elements in the covariance matrix as unknown parameters (a necessary but not sufficient condition). Maximum likelihood estimation minimizes the discrepancy between the observed and model-implied covariance matrices and provides standard errors and chi-square tests of exact fit.
Anderson and Gerbing (1988) advocated a two-step approach: first evaluate the measurement model (CFA) to ensure that latent variables are well-measured, then add structural paths and evaluate the full model. This prevents poor measurement from masquerading as structural misfit or, conversely, structural misfit from being absorbed into measurement parameters. The quality of measurement — high loadings, low cross-loadings, adequate reliability — is a prerequisite for meaningful structural inferences.
Types of Latent Variable Models
The latent variable framework encompasses a wide family of models. Confirmatory factor analysis is a pure measurement model with no structural paths. Path models with latent variables add structural relationships. Growth curve models treat intercepts and slopes of longitudinal trajectories as latent variables. Mixture models introduce categorical latent variables representing unobserved subpopulations. Item response theory can be viewed as a latent variable model with categorical (binary) indicators. This unifying perspective reveals deep connections among methods that were historically developed independently.
The power of latent variable models lies in their ability to separate signal from noise. By using multiple indicators, the model isolates the common variance that defines the construct from the unique variance (measurement error plus item-specific variance) that obscures it. Structural relationships estimated between latent variables are disattenuated — they represent the relationships between the constructs themselves, not between their imperfect measures. This correction for measurement error is one of the most important advantages of SEM over regression with observed variables, and it is the mathematical realization of the psychometric principle that constructs, not their indicators, are the objects of scientific interest.