Mathematical Psychology
About

Temporal Difference Learning

Temporal difference learning updates value predictions based on the difference between successive predictions, providing a computational account of reward-driven learning.

δₜ = rₜ + γV(sₜ₊₁) − V(sₜ)

Temporal difference (TD) learning, developed by Richard Sutton in 1988, is a reinforcement learning algorithm that updates predictions about future rewards based on the discrepancy between consecutive predictions. TD learning bridges the Rescorla-Wagner model from animal learning theory and dynamic programming from optimal control theory.

TD Learning Update δₜ = rₜ + γ · V(sₜ₊₁) − V(sₜ)
V(sₜ) ← V(sₜ) + α · δₜ

δₜ = TD error (reward prediction error)
γ = discount factor (0 to 1)
α = learning rate

Connection to Dopamine

In a landmark discovery, Schultz, Dayan, and Montague (1997) showed that the firing patterns of midbrain dopamine neurons closely match the TD prediction error signal. Dopamine neurons fire when rewards are unexpected (positive δ), pause when expected rewards are omitted (negative δ), and show no response to fully predicted rewards (δ = 0). This correspondence has become one of the most successful examples of a computational model directly predicting neural activity.

Relationship to Rescorla-Wagner

The Rescorla-Wagner model can be seen as a special case of TD learning where there is only one time step between CS and US. TD learning generalizes this by allowing prediction errors to propagate backwards through multiple time steps, explaining phenomena like second-order conditioning and the timing of conditioned responses that the Rescorla-Wagner model cannot address.

Interactive Calculator

Each row records a state transition: state (integer), reward (numeric), next_state (integer). The calculator applies temporal-difference learning: δ = r + γV(s') − V(s). Parameters: α=0.1 (learning rate), γ=0.9 (discount factor).

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

References

  1. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. https://doi.org/10.1007/BF00115009
  2. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599. https://doi.org/10.1126/science.275.5306.1593
  3. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press. https://doi.org/10.1109/TNN.1998.712192
  4. Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53(3), 139–154. https://doi.org/10.1016/j.jmp.2008.12.005

External Links