Model-Based vs. Model-Free

Daw, Niv, and Dayan (2005) formalized the psychological distinction between habitual and goal-directed behavior as a computational distinction between model-free and model-based reinforcement learning. Model-free learning caches action values through direct experience (like Q-learning), while model-based learning maintains an internal model of the environment and plans by simulating outcomes.

Two Systems

Model-Free vs. Model-Based Model-free: Q(s,a) ← Q(s,a) + α·δ (cached values, fast, inflexible)
Model-based: Q(s,a) = Σ T(s'|s,a)·[R(s') + γ·max_a' Q(s',a')] (planned, slow, flexible)

Model-free ≈ habitual (dorsolateral striatum)
Model-based ≈ goal-directed (prefrontal cortex, caudate)

The Two-Step Task

Daw et al. (2011) designed the two-step task to dissociate the two systems behaviorally. After a first-stage choice leads probabilistically to one of two second stages, model-free agents repeat actions that led to reward regardless of the transition probability, while model-based agents account for the transition structure. Most people show a mixture of both strategies, with the relative contribution of model-based planning varying with cognitive load, stress, and individual differences.

The model-based/model-free framework has been applied to understanding compulsive behavior in OCD (excessive habitual control), addiction (shift from goal-directed to habitual drug seeking), and the development of cognitive control across childhood and adolescence.

Two Systems

The Two-Step Task

References

External Links

Two Systems

The Two-Step Task

Related Topics

References

External Links