Multi-group SEM extends structural equation modeling to test whether model parameters are equivalent across two or more groups defined by characteristics such as gender, ethnicity, age, culture, or experimental condition. This is essential for establishing that a psychological construct has the same meaning and structure across populations. Without such evidence, group comparisons on latent variables are potentially meaningless — observed differences might reflect true construct differences or artifacts of differential measurement.
The Testing Hierarchy
2. Metric (weak): Λ_group1 = Λ_group2 (equal factor loadings)
3. Scalar (strong): Λ and τ equal (equal loadings and intercepts)
4. Strict: Λ, τ, and Θ equal (equal loadings, intercepts, and residuals)
Testing proceeds hierarchically, with each level imposing additional equality constraints. Configural invariance requires only that the same items load on the same factors in each group — the baseline model. Metric invariance constrains factor loadings to be equal, ensuring that a one-unit change in the latent variable produces the same change in the indicator across groups. Scalar invariance additionally constrains item intercepts, ensuring that group differences in observed means reflect genuine differences in latent means. Strict invariance further constrains residual variances, ensuring equal measurement precision.
Testing Procedures
Each level is tested against the previous one using the chi-square difference test: Δχ² = χ²_constrained − χ²_less constrained, with Δdf degrees of freedom. A significant Δχ² indicates that the additional constraints worsen fit, suggesting non-invariance. However, with large samples, Δχ² may be significant for trivial non-invariance. Alternative criteria use changes in approximate fit indices: ΔCFI ≤ −0.010 and ΔRMSEA ≤ 0.015 suggest that the constraints are tenable (Chen, 2007).
If Δχ² is not significant → metric invariance holds
Then test: Δχ² = χ²_scalar − χ²_metric
When full invariance at a given level is rejected, researchers may test partial invariance — freeing the constraints for specific non-invariant items while maintaining them for the rest. Byrne, Shavelson, and Muthén (1989) argued that meaningful group comparisons are possible with partial scalar invariance, provided that at least two indicators per factor remain invariant (to identify the model). Partial invariance acknowledges that some items may function differently across groups while still allowing latent mean comparisons based on the invariant items.
Structural Invariance
Once measurement invariance is established, researchers can test structural invariance — whether the relationships among latent variables (structural paths, factor variances, factor covariances, latent means) are equivalent across groups. Equal structural paths would indicate that the theoretical relationships hold equally in both groups. Unequal paths indicate moderation: the structural relationship differs across groups. Latent mean differences can only be meaningfully interpreted when at least scalar invariance holds.
Multi-group SEM is the gold standard for establishing measurement equivalence in cross-cultural research, developmental psychology, and any context where group comparisons are central. It provides a principled framework for determining whether observed group differences reflect genuine construct differences or measurement artifacts. The method has been extended to multiple indicators multiple causes (MIMIC) models, which include group membership as a covariate within a single-group model, and to alignment methods that handle many groups simultaneously without requiring exact invariance constraints.