Mathematical Psychology
About

Harmony Theory

Smolensky's Harmony Theory defines a harmony function over network states and uses stochastic processing to find configurations that maximize this measure of goodness-of-fit, connecting connectionist computation to statistical mechanics and Boltzmann machines.

H = Σᵢ<ⱼ wᵢⱼ · aᵢ · aⱼ + Σᵢ bᵢ · aᵢ

Harmony Theory, introduced by Paul Smolensky (1986) in the PDP volumes, provides a mathematical framework for understanding computation in stochastic connectionist networks. The central idea is that a network's processing can be understood as maximizing a harmony function — a scalar measure of how well the network's current state fits the statistical regularities of its environment. This framework connects connectionist computation to the Boltzmann distribution of statistical mechanics and to probabilistic inference, establishing deep links between neural networks and statistical learning theory.

The Harmony Function

Harmony (Energy) Function H(a) = Σᵢ<ⱼ wᵢⱼ · aᵢ · aⱼ + Σᵢ bᵢ · aᵢ

Boltzmann distribution: P(a) = (1/Z) · e^(H(a)/T)
Z = Σₐ e^(H(a)/T) (partition function)
T = computational temperature (controls stochasticity)

The harmony function H(a) for a state vector a is the sum of all pairwise products of connected unit activations weighted by their connection strengths, plus bias terms. Higher harmony corresponds to a better "fit" between the network state and the encoded knowledge (stored in the weights). The relationship between harmony and the energy function used in Boltzmann machines (Hinton & Sejnowski, 1986) is E = −H: maximizing harmony is equivalent to minimizing energy. At thermal equilibrium, the probability that the network is in a particular state follows the Boltzmann distribution, with higher-harmony states being exponentially more probable.

Connection to Boltzmann Machines

The Boltzmann machine, developed by Hinton and Sejnowski (1986), implements Harmony Theory using stochastic binary units that turn on or off probabilistically as a function of their net input and the temperature parameter T. At high temperature, units flip randomly (exploration); at low temperature, the network settles into high-harmony states (exploitation). The learning algorithm adjusts weights to increase the harmony of observed data patterns and decrease the harmony of patterns the network generates spontaneously, implementing a form of maximum likelihood estimation.

From Harmony to Integrated Connectionist/Symbolic Models

Smolensky later developed Harmony Theory into Optimality Theory in linguistics (Prince & Smolensky, 1993/2004), where linguistic forms are those that maximize harmony subject to a ranked set of violable constraints. This application shows how a connectionist optimization framework can be used to formalize the structured representations of linguistic theory. The Harmonic Grammar variant replaces strict ranking with weighted constraints, maintaining a closer connection to the original neural network formulation and allowing gradient, probabilistic variation.

Harmony Theory's significance for mathematical psychology lies in its demonstration that connectionist computation can be understood in precise mathematical terms borrowed from statistical physics. The harmony function provides a global objective that the network optimizes, the Boltzmann distribution provides a principled link to probability theory, and the learning algorithm implements a form of statistical estimation. This framework influenced subsequent developments including Restricted Boltzmann Machines, deep belief networks, and modern energy-based models, while also providing a theoretical bridge between subsymbolic connectionist processing and higher-level cognitive phenomena.

Related Topics

References

  1. Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations (pp. 194–281). MIT Press. doi:10.7551/mitpress/5236.001.0001
  2. Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations (pp. 282–317). MIT Press. doi:10.7551/mitpress/5236.001.0001
  3. Prince, A., & Smolensky, P. (2004). Optimality Theory: Constraint interaction in generative grammar. Blackwell. doi:10.1002/9780470759400
  4. Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9(1), 147–169. doi:10.1207/s15516709cog0901_7

External Links