Distributed Representations

Distributed representations are the representational foundation of connectionist and PDP models. In a distributed scheme, a concept is not associated with a single dedicated unit (as in a localist representation) but is encoded as a pattern of activation across an entire population of units. Conversely, each unit contributes to the representation of many different concepts. This idea, formalized by Hinton (1986) and Hinton, McClelland, and Rumelhart (1986), has profound consequences for how similarity, generalization, and memory operate in neural networks.

Properties of Distributed Representations

Distributed vs. Localist Coding Localist: concept A → unit 7 active (all others silent)
Distributed: concept A → [0.3, 0.9, 0.1, 0.7, 0.2, ...]

Similarity: sim(A, B) = cos(θ) = (a · b) / (‖a‖ · ‖b‖)
Superposition: multiple concepts can be partially active simultaneously
Capacity: can represent exponentially many concepts with N units (2ᴺ binary patterns)

Similarity structure: In distributed representations, similar concepts have overlapping patterns of activation. The degree of overlap — measured by cosine similarity, dot product, or Euclidean distance — directly encodes the similarity structure of the domain. A network that represents "dog" and "cat" with overlapping patterns will automatically generalize what it has learned about dogs to cats, because the shared activation components produce shared behavior. This contrasts with localist representations, where similarity must be explicitly encoded through connection weights.

Superposition and Generalization

Superposition: Multiple items can be partially active simultaneously in a distributed system, enabling blending and composition. This property supports content-addressable memory (presenting a partial pattern can retrieve the full pattern), prototype extraction (the average of several exemplar patterns approximates a prototype), and analogy (similar items produce similar outputs). Superposition also raises the problem of "crosstalk" or "catastrophic interference" — when too many patterns are stored in the same set of weights, they interfere with each other, degrading recall.

Microfeatures and Learned Representations

Hinton (1986) proposed that the features over which distributed representations are defined need not be hand-crafted but can be learned from data — he called these "microfeatures." When a multilayer network is trained with backpropagation, the hidden units develop distributed representations in which each unit responds to a meaningful micro-feature of the input. These learned features often correspond to psychologically interpretable dimensions, as demonstrated in models of semantic cognition where hidden units learn to represent features like "has legs," "is animate," or "is edible" without being explicitly told about these categories.

The theoretical capacity of distributed representations is exponential in the number of units: with N binary units, up to 2ᴺ distinct patterns are possible. However, the practical capacity is limited by the need to keep patterns sufficiently distinct to avoid interference. The tradeoff between capacity and fidelity is governed by the sparseness of the representation — how many units are active for each pattern. Sparse distributed representations, where only a small fraction of units are active at any time, combine high capacity with low interference and are widely used in models of hippocampal memory and cortical coding.

Properties of Distributed Representations

Superposition and Generalization

References

External Links

Properties of Distributed Representations

Superposition and Generalization

Related Topics

References

External Links