Mathematical Psychology
About

Backpropagation

The backpropagation algorithm uses the chain rule of calculus to compute the gradient of the error function with respect to every weight in a multilayer network, enabling gradient descent learning of hidden representations.

Δwⱼᵢ = −η · ∂E/∂wⱼᵢ = −η · δⱼ · xᵢ

Backpropagation — short for "backward propagation of errors" — is the algorithm that made multilayer neural networks practical. Although the mathematical idea of computing gradients by the chain rule was known earlier (Werbos, 1974), it was Rumelhart, Hinton, and Williams (1986) who demonstrated its power for training networks with hidden layers, publishing their landmark paper in Nature. The algorithm computes how much each weight in the network contributed to the overall error, then adjusts every weight simultaneously in the direction that reduces that error.

The Algorithm

Backpropagation Weight Update Error function: E = ½ Σₖ (tₖ − oₖ)²
Output layer delta: δₖ = (tₖ − oₖ) · f′(netₖ)
Hidden layer delta: δⱼ = f′(netⱼ) · Σₖ wₖⱼ · δₖ

Weight update: Δwⱼᵢ = η · δⱼ · xᵢ
η = learning rate

The computation proceeds in two phases. In the forward pass, input is propagated through the network layer by layer to produce an output. In the backward pass, the error at the output is propagated backward through the network, computing the "delta" (local error signal) for each unit. Each delta is the product of the derivative of the activation function at that unit and the weighted sum of deltas from the layer above. The weight update for any connection is then simply the product of the learning rate, the sending unit's activation, and the receiving unit's delta.

Learning Dynamics and Challenges

Backpropagation performs gradient descent on the error surface — a high-dimensional landscape whose shape depends on the training data, the architecture, and the activation functions. The learning trajectory can be affected by local minima (though in practice these are rarely a serious problem for large networks), saddle points, and flat plateaus where learning stalls. Practical enhancements include momentum (adding a fraction of the previous weight change to the current update), adaptive learning rates, and weight decay (a regularization term that penalizes large weights).

Biological Plausibility Debate

Whether the brain implements anything like backpropagation has been debated since the algorithm was introduced. Critics note that biological neurons do not have access to the symmetric weight matrices required for the backward pass, that learning in the brain appears more local, and that error signals would need to be propagated backward through many synapses. However, recent proposals such as predictive coding, feedback alignment, and equilibrium propagation suggest that biologically plausible mechanisms can approximate the gradient computations of backpropagation, keeping the debate active.

In cognitive modeling, backpropagation's significance extends beyond its role as a training algorithm. The internal representations that emerge in hidden layers after learning have been used to explain phenomena in language acquisition (past-tense learning), reading (mapping orthography to phonology), and semantic cognition (learning the structure of conceptual knowledge). The representations are not hand-coded but emerge from the statistics of the training environment, providing a compelling account of how structured knowledge could arise from experience.

Related Topics

References

  1. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. doi:10.1038/323533a0
  2. Werbos, P. J. (1994). The roots of backpropagation: From ordered derivatives to neural networks and political forecasting. Wiley. doi:10.1002/bit.260440514
  3. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. doi:10.1038/nature14539
  4. Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. (2020). Backpropagation and the brain. Nature Reviews Neuroscience, 21(6), 335–346. doi:10.1038/s41583-020-0277-3

External Links