Daw, Niv, and Dayan (2005) formalized the psychological distinction between habitual and goal-directed behavior as a computational distinction between model-free and model-based reinforcement learning. Model-free learning caches action values through direct experience (like Q-learning), while model-based learning maintains an internal model of the environment and plans by simulating outcomes.
Two Systems
Model-based: Q(s,a) = Σ T(s'|s,a)·[R(s') + γ·max_a' Q(s',a')] (planned, slow, flexible)
Model-free ≈ habitual (dorsolateral striatum)
Model-based ≈ goal-directed (prefrontal cortex, caudate)
The Two-Step Task
Daw et al. (2011) designed the two-step task to dissociate the two systems behaviorally. After a first-stage choice leads probabilistically to one of two second stages, model-free agents repeat actions that led to reward regardless of the transition probability, while model-based agents account for the transition structure. Most people show a mixture of both strategies, with the relative contribution of model-based planning varying with cognitive load, stress, and individual differences.
The model-based/model-free framework has been applied to understanding compulsive behavior in OCD (excessive habitual control), addiction (shift from goal-directed to habitual drug seeking), and the development of cognitive control across childhood and adolescence.