The reward prediction error (RPE) is the discrepancy between the reward actually received and the reward that was expected. This signal, central to both the Rescorla-Wagner model and temporal difference learning, was discovered to be encoded by midbrain dopamine neurons in one of the most celebrated findings in computational neuroscience.
Neural Evidence
Expected reward: δ = 0 → no change (baseline firing)
Omitted expected reward: δ < 0 → dopamine pause (phasic decrease)
With learning, dopamine response transfers from reward to reward-predicting cue
Schultz, Dayan, and Montague (1997) showed that the firing patterns of dopamine neurons in the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) match the temporal difference prediction error signal with remarkable precision. This correspondence between a computational quantity (RPE) and a neural signal (phasic dopamine) remains one of the strongest bridges between computational models and neurobiology.
Implications for Psychology
The RPE framework has been applied to understanding addiction (drugs of abuse hijack the RPE signal), depression (reduced dopaminergic RPE signals), and decision-making deficits in Parkinson's disease and schizophrenia. Individual differences in RPE signaling have been linked to trait impulsivity, reward sensitivity, and vulnerability to substance use disorders.