Mathematical Psychology
About

Maximum Entropy Models

The maximum entropy principle selects the probability distribution with the greatest entropy subject to known constraints, providing an objective, least-biased inference framework with applications from neural modeling to psychophysics.

p*(x) = arg max H(p) subject to E[fₖ(x)] = Fₖ

The maximum entropy (MaxEnt) principle, formalized by Edwin T. Jaynes (1957) and rooted in the work of Boltzmann and Gibbs in statistical mechanics, states that when making inferences from incomplete information, one should choose the probability distribution that maximizes Shannon entropy subject to the constraints imposed by known data. This principle produces the least biased estimate consistent with the available evidence — it avoids introducing assumptions not warranted by the data.

Formal Statement

Maximum Entropy Principle Maximize: H(p) = −Σ p(x) · log p(x)

Subject to:
Σ p(x) = 1 (normalization)
Σ p(x) · fₖ(x) = Fₖ for k = 1, ..., K (moment constraints)

Solution: p*(x) = (1/Z) · exp(−Σ λₖ · fₖ(x))
Z = Σ exp(−Σ λₖ · fₖ(x)) (partition function)

The solution is always an exponential family distribution, with the Lagrange multipliers λₖ determined by the constraints. If the only constraint is the mean, the MaxEnt distribution is the exponential. If both mean and variance are constrained, the MaxEnt distribution is the Gaussian. This provides a principled justification for many commonly used distributions: they are the least-biased distributions consistent with specific moment constraints.

Jaynes' Interpretation

Jaynes argued that the maximum entropy principle extends the principle of insufficient reason to situations with partial information. When we know nothing about a distribution, maximum entropy yields the uniform distribution — the classical principle of indifference. When we know some moments, it yields the distribution that is maximally noncommittal about everything else. This interpretation frames probability and statistical inference as extensions of logic, not as statements about physical frequencies.

MaxEnt in Neural Population Modeling

Schneidman, Berry, Segev, and Bialek (2006) applied maximum entropy models to neural population activity, constraining only the mean firing rates and pairwise correlations of neurons. These pairwise MaxEnt models — mathematically equivalent to Ising models from statistical physics — captured over 90% of the multi-neuron correlation structure in retinal ganglion cell populations. This finding suggests that higher-order interactions contribute relatively little beyond what is predicted by pairwise statistics, a result with profound implications for understanding neural coding.

Applications in Psychology

In psychophysics, maximum entropy models provide principled prior distributions when applying Bayesian analysis to perception experiments. In cognitive modeling, MaxEnt is used to construct baseline "null" models: the maximum entropy distribution consistent with observed marginal statistics provides a benchmark against which structured models can be compared. The degree to which a cognitive model improves upon the MaxEnt baseline quantifies the explanatory value of the model's structural assumptions.

Maximum entropy models also connect to information geometry and to the exponential family of distributions widely used in generalized linear models. In computational psychiatry, MaxEnt models have been applied to characterize the statistical structure of behavioral sequences, with departures from maximum entropy interpreted as signatures of cognitive biases or pathological states.

Related Topics

References

  1. Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620–630. doi:10.1103/PhysRev.106.620
  2. Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press. doi:10.1017/CBO9780511790423
  3. Schneidman, E., Berry, M. J., Segev, R., & Bialek, W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440(7087), 1007–1012. doi:10.1038/nature04701
  4. Berger, A. L., Della Pietra, V. J., & Della Pietra, S. A. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.

External Links