back to home
PERSON PERCEPTION AND IMPRESSION FORMATION
1. Making Sense of People:
Coherence Mechanisms
Paul Thagard and Ziva Kunda
2. On the Dynamic
Construction of Meaning: An Interactive Activation and
Competition
Model of Social Perception
Stephen J. Read and Lynn C. Miller
STEREOTYPING AND SOCIAL
CATEGORIZATION
3. The Dynamics of Group
Impression Formation: The Tensor Product Model of
Exemplar-Based Social
Category Learning
Yoshihisa Kashima, Jodie Woolcock, and Deborah King
4. Person perception and
stereotyping: Simulation using Distributed Representations in a
Recurrent Connectionist Network
Eliot R. Smith and Jamie DeCoster
CAUSAL REASONING
5. A Connectionist
Approach to Causal Attribution
Frank Van Overwalle and Dirk Van Rooy
PERSONALITY AND BEHAVIOR
6. Personality as a Stable
Cognitive-Affective Activation Network: Characteristic Patterns
of Behavior Variation
Emerge from a Stable Personality Structure
Yuichi Shoda and Walter Mischel
ATTITUDES AND BELIEFS
7. The Consonance Model of
Dissonance Reduction
Thomas R. Shultz
and Mark R. Lepper
8. Toward an Integration
of the Social and the Scientific: Observing, Modeling, and
Promoting the Explanatory Coherence of Reasoning
Michael Ranney and
Patricia Schank
SOCIAL INFLUENCE AND GROUP
INTERACTION
9. Toward Computational
Social Psychology: Cellular Automata and Neural Network
Models of Interpersonal
Dynamics
Andrzej Nowak and
Robin R. Vallacher
10. Attitudes, Beliefs and
Other Minds: Shared Representations in Self-Organizing Systems
Richard Eiser, Mark
J. A. Claessen, and Jonathan J. Loose
Preface
|
|
Stephen J. Read and Lynn Carol Miller
|
Neural network models, also called connectionist or parallel
distributed processing models,
seem to represent a major paradigm shift in cognitive psychology,
cognitive science and artificial
intelligence. Such models move us away from the idea of mind as
computer, and instead promise
the possibility of brain style models of the mind, admitting the
possibility that models of high level
cognitive processing can be built from simple neuron like units.
That is, we can build
computational models of the mind composed of units functionally
similar to the physical units that
compose a real brain. This approach has led to some fundamental new
insights about the way the
mind might work and the way it might interact with the environment.
Surprisingly, given the importance
of these models, until recently social psychologists had
paid little attention to them. Yet, these models directly address
several fundamental characteristics
of social perception and social interaction: the simultaneous
integration of multiple pieces of
information and the quite short time frame within which such
integration occurs. Any mundane act
of social perception (and any resulting behavior) results from the
simultaneous integration of
multiple pieces of information, such that the meaning of each piece
of information mutually
influences and constrains the meaning of each other piece. Thus,
social perception can be viewed
as the solution of simultaneous mutually interacting constraints.
Moreover, this integration
typically takes place in a very short time frame, much shorter than
would be possible for any kind
of reasonable serial integration process. Thus, much of social
perception must occur in parallel.
Both of these are central characteristics of neural network models
(Rumelhart & McClelland,
1986).
Social psychologists lack of
involvement with these models is surprising for another
reason. As Read, Vanman and Miller (1997) have recently shown, there
are a number of important
parallels between characteristics of these models and the Gestalt
principles that formed the
theoretical foundation of much of modern social psychology (Asch,
1946; Festinger, 1950, 1957;
Heider, 1958; Lewin, 1935,1947a, 1947b).
However, there has been a recent
surge of interest in the application of these kinds of
models to social phenomena. This book brings most of this work
together in one place. Doing so
allows the reader to appreciate the breadth of these approaches, as
well as the theoretical
commonality of many of these models. Each of the chapters provides
an explicit connectionist
model of a central problem in social psychology. Because most of the
authors either use a standard
architecture, can provide a computer program for their model, or use
a publically available system
for modeling, the interested reader, with a little work, should be
able to implement their own
variation of a model.
The authors in this volume address
a number of central issues in social psychology and
show how these kinds of models provide insight into a number of
classic issues. Moreover, many
of the chapters provide hints that this approach provides the seeds
of a theoretical integration that
the field has long lacked.
Smith and DeCoster, and Kashima,
Woolcock, and King outline models of the learning
and application of social categories and stereotypes. Kunda and
Thagard, Read and Miller, and
Van Overwalle and Van Rooy
describe models of causal reasoning, social explanation and person
perception. Shoda and Mischel present a model of personality and
social behavior. Shultz and
Lepper show how a neural network model can capture many of the
classic dissonance phenomena,
while Ranney and Schank grapple with belief change and the coherence
of large scale belief
systems. Finally, Nowak and Vallacher, and Eiser, Claessen, and
Loose show that these are not
just models of individual cognition, but that they can also capture
important aspects of social
influence and group interaction.
Connectionist models
In the following we present a
very brief overview of connectionist models. We considered
(briefly) a more extensive tutorial. However, there are a number of
good introductions, some
aimed at cognitive psychologists and two recent ones aimed
specifically at social psychologists.
Thus, it seemed pointless to repeat what had already been said, in
much more detail elsewhere.
There is probably still no better
introduction to neural network models and their
psychological implications than the two edited volumes by Rumelhart
and McClelland and the PDP
research group (1986; McClelland & Rumelhart, 1986). Other good
resources for the social
psychologist are Anderson's (1995) recent textbook and Bechtel and
Abrahamsen's (1991) book,
which was written as a companion to the PDP volumes. And recently
Smith (1996) and Read,
Vanman, and Miller (1997) have specifically focused on the
implications of these kinds of models
for the kinds of problems with which social psychologists are
concerned. Moreover, Read,
Vanman, and Miller (1997) also extensively discuss the numerous
parallels between key aspects of
neural network models and the Gestalt psychological principles that
formed the theoretical
foundations of much of modern social psychology.
Connectionist modeling (e.g., Hertz,
Krogh, & Palmer, 1991; McClelland & Rumelhart,
1986; Rumelhart & McClelland, 1986) treats the processing involved
in perceptual and cognitive
tasks in terms of the passage of activation, in parallel, among
simple, neuron-like units. The most
important components of these models are: (1) simple processing
units or nodes, which sum the
incoming activation, following a specified equation, and then send
the resulting activation to the
nodes to which they are connected, (2) equations that determine the
activation of each node at each
point in time, based on the incoming activation from other nodes,
previous activation, and the
decay rate, (3) weighted connections between the nodes, where the
weights affect how activation is
spread, and (4) a learning rule which specifies how the weights
change in response to experience
(Bechtel & Abrahamsen, 1991). Processing in a connectionist model
proceeds solely by the
spread of activation among nodes, where the pattern of connections
affects how activation spreads.
There is no higher order executive or control process. Moreover,
knowledge in a connectionist
model is represented entirely in the pattern of weights among nodes.
Although there are a number of
differences among potential neural network models, here we
focus on two important differences. One is whether there are
feedback relations among the nodes.
In feed forward networks, units have unidirectional connections,
with no feedback relations. The
network is organized in layers, with inputs fed into the input layer
and outputs generated at the top
layer as a result of a single forward sweep of activation. The
simplest such network has two
layers, an input and an output layer, although more complicated
networks may have intervening or
"hidden" layers (so-called because they have no direct connections
to the environment.).
Networks with hidden layers, such as the well-known back propagation
network, have greater
computational power. A prototypical example of a feed forward
network is the pattern associator,
in which the system learns an arbitrary association between an input
represented as a pattern of
activation on the input layer and a pattern represented on the
output layer. Such networks can learn
to categorize objects or assign names to objects.
By contrast, in interactive or
feedback networks, at least some connections are bi-directional,
resulting in feedback relations, and processing occurs dynamically
across a large number of cycles.
Nodes in these networks have a
minimum and maximum possible activation (typically ranging
from 0 to 1, or from -1 to 1). The activation of the nodes is
updated many times as the activation of
the units moves towards asymptote, and as the system works toward
settling into a solution to a
particular input. In contrast, in feed forward networks, activation
is updated only once.
Because of the feedback relations,
interactive or feedback networks are dynamic systems
whose behavior evolves over time. As a result they have interesting
and useful properties that are
not characteristic of feed forward networks. One of the most useful
properties of such networks is
that they function as parallel constraint satisfaction systems,
acting to satisfy multiple simultaneous
constraints among elements in a network. Most of the networks in
this book are feedback
networks and the constraint satisfaction abilities of the networks
are central aspects of the models.
A second important difference among
models is whether concepts have a distributed or a
localist representation. In a localist representation, a concept or
perhaps an entire proposition is
represented by a single node. In contrast, in a distributed
representation a concept is represented by
a pattern of activation over a number of nodes. Although some
researchers see distributed
representations as a defining characteristic of connectionism, we
take the view of many researchers
that the representation one should use should depend on one's
question.
Each of these types of
representations has their strengths and weaknesses. We see three
major advantages to a distributed representation. First, such a
representation does seem more in
line with the attempt to model the mind using neuron like units and
does seem to fit our intuition
that the representation of a concept should be in terms of the
action of large clusters of neurons,
rather than an individual neuron. Second, a distributed
representation has the property of graceful
degradation. That is, loss of a small number of neurons has little
if any impact on the
representational ability of the model. In contrast, in a localist
model loss of a single neuron leads
to the loss of the corresponding concept. Third, during learning a
distributed representation
implicitly calculates the degree of similarity among inputs. That
is, if the activation vectors
representing different inputs are sufficiently similar, they will
tend to receive a common
representation in the network. This underlies the ability of such
models to learn prototypes from
related exemplars. In contrast, a localist model has no such
ability.
However, localist models have their
own strengths, which are the flip side of some of the
weaknesses of distributed models. First, localist models are often
much more interpretable, as
each concept corresponds to a single node. In contrast, in a
distributed representation, because
each concept is represented by a pattern of activation over a large
number of nodes, it can often be
quite difficult to interpret the behavior of such models. Second,
localist models are often much
more computationally tractable. Consider a simple model with 20
concepts. In a localist model,
this will only take 20 units and a 20 X 20 weight matrix. In
contrast, assume we had a distributed
representation in which each concept was represented by 20 elements.
In this model we need 400
units and a 400 X 400 weight matrix. The distributed model has 400
times as many weights. And
the problem only gets more serious as the model gets bigger.
One other issue is relevant to the
issue of whether one should use a localist or a distributed
representation. Assume that one is developing a model of high level
cognition, such as a model of
analogical reasoning, explanation, or cognitive consistency. In
these kinds of models, one is
typically interested in relationships among concepts, such as causal
or implicational relationships.
And frequently the key theoretical mechanism is the parallel
satisfaction of mutual constraints
among concepts. What is central is the relations among concepts,
rather than the representation of
concepts. In such cases, it seems likely that the pattern of
activation of an ensemble of neurons can
be treated as if it were a single node, with little or no loss of
theoretical power. In that case a
distributed representation would have no advantages and many costs.
Thus, one's choice of
representation, we argue, should be a function of one's question. If
graceful degradation is important or if one is looking at questions
of concept learning or
categorization, where sensitivity to similarity is central, then a
distributed representation would
seem essential. However, in cases where the special strengths of
distributed representations are
unnecessary, then the relative
conceptual and computational simplicity of localist models would
seem more desirable.
The various chapters in this book
represent some of the conditions under which each kind of
representation would seem most appropriate. For instance, several
authors such as Smith and
DeCoster, and Kashima, Woolcock, and King are explicitly interested
in models of category
learning. Or Read and Miller are interested in the learning of the
components of trait concept.
Here distributed representations would seem critical. However, other
chapters, such as Shultz and
Lepper's chapter on Dissonance, Shoda and Mischel's model of
personality-behavior relationships
or Kunda and Thagard's chapter on the role of coherence are
primarily interested in the
implications of processing in recurrent networks, specifically the
fact that such networks function
as systems for the parallel satisfaction of multiple simultaneous
constraints. In these chapters,
distributed representations would have provided no additional
insights and would have
tremendously complicated the models.
Overview of the book
We considered two possible ways
to conceptually group the current chapters: (1) in terms of
the underlying neural network architecture that is used, or (2) in
terms of the specific topic being
investigated. Our ultimate choice was the latter, on the assumption
that most readers would be
primarily interested in the specific topic and how the different
investigators approached it.
However, in the following descriptions of the chapters we have
briefly noted the kind of model
that was used. It is an interesting side note that 8 of the 10
chapters use a recurrent or feedback
architecture, while only three use a feedforward architecture
(Eiser, Claessen, and Loose explore
both kinds of architectures). So here follows an overview of each of
the 10 chapters.
Thagard and Kunda argue that
coherence mechanisms play a central role in three different
processes by which people make sense of other people's behavior, how
we: (1) integrate a number
of concepts, such as traits, to form an impression of another, (2)
arrive at an attribution or
explanation of someone's behavior, and (3) use analogies to familiar
others to make sense of
someone's behavior. Not surprisingly to anyone familiar with their
work, Thagard and Kunda
argue that coherence mechanisms can be treated as constraint
satisfaction problems that can be
captured by recurrent or feedback connectionist networks.
They then review their work in each
of these three areas. First, they describe their recent
model of impression formation (Kunda & Thagard, 1996) and how it can
capture such phenomena
as shifts in meaning of concepts during impression formation and the
development of new or
emergent concepts from combinations of other concepts. Second, they
discuss Thagard's (1989,
1992) model of explanatory coherence and its implications for the
understanding of social
explanation (also see, Miller & Read, 1991; Read & Marcus-Newhall,
1993; Read & Miller,
1993). Third, they describe Holyoak and Thagard's (1989, 1995) work
on constraint satisfaction
models of analogical reasoning and analog retrieval and they discuss
its possible application to a
number of phenomena in social perception, such as social comparison,
using the self as a model to
understand others, and using parents and friends to understand new
acquaintances. As part of
their discussion they demonstrate how each of these somewhat
different phenomena can be treated
in terms of the same underlying principle, as a coherence mechanism,
operationalized as a
constraint satisfaction process. In line with this conclusion, they
also discuss the likelihood that
these three types of coherence mechanisms are integrated when we
actually try to make sense of
behavior in social interaction. Finally, following a major focus in
social cognition, they also
examine the extent to which each of these different processes are
automatic or controlled.
Read and Miller present an
interactive activation and competition(IAC) model of social
perception, based on work by McClelland and Rumelhart (1981;
McClelland & Elman, 1986;
Rumelhart & McClelland, 1982) on word recognition and speech
perception. This model is a
feedback or recurrent network, with the nodes organized into
multiple layers, where each layer
does a different kind of processing and sends the results to higher
levels. One interesting aspect of
this kind of model is that not only do lower levels, such as feature
analysis, send activation to
higher levels, but higher
levels can also affect lower levels. For instance, a highly
activated trait
node can send activation back to the feature nodes that compose the
original behavior,
disambiguating unclear or ambiguous inputs.
Read and Miller propose a four level
network, with each level sending activation to the level
above, and in turn receiving activation from the higher level. The
nodes in such a model can be
treated as hypotheses about the presence or absence of the
corresponding concept, with alternative
construals or hypotheses having inhibitory links and consistent or
supportive hypotheses having
excitatory links. The first level in their model is the Feature
level, composed of nodes sensitive to
the features of human beings, objects, and behavior. Activation from
this level then goes to an
Identification level, where the individual features are used to
identify social actors, objects, and
behaviors. Actors, objects, and behaviors identified at this level
are then assembled into a coherent
representation of the social action at the third level, the Story or
Scenario level.
A central aspect of Read and
Miller's model is the proposal that social concepts at this level
are represented in terms of plot units or frame-based structures,
with a case-role structure, where
each action centers around a verb or action unit that identifies the
various roles, such as actor,
patient, and instrument that participate in that action. For
instance, they argue that many traits are
composed of underlying story structures.
Finally, information from the Story
level is used to arrive at the meaning of the interaction at
the Conceptual or Meaning level. For example, the instantiated story
structure may be used to
access various trait characterizations for a social actor.
This model naturally implements
various principles of Explanatory Coherence (Thagard,
1989, 1992) that have been shown to play a central role in social
reasoning (Ranney & Schank,
this volume; Read & Marcus-Newhall, 1993; Read & Miller, 1993), as
well as capturing the
impact of a limited capacity working memory. Read and Miller also
discuss some of the
implications of such feedback or attractor models for both learning
of social concepts and the
combination of old concepts to form novel ones. They note that
during learning such models
perform a componential analysis of concepts. For example, readers
can learn subcomponents of
words or social perceivers can learn subcomponents of traits, such
as goals, plans, and beliefs. As
a result, such a model can capture the acquisition of primitive
concepts during learning. Moreover,
they discuss how such models can take advantage of such a
componential analysis to combine
previously learned concepts to form novel concepts. This focus on
conceptual combinations is
also shared with Kunda and Thagard, and Smith and DeCoster.
Finally, Read and Miller apply their
model to two major topics in social perception. First,
they discuss how it provides an explicit process model of
spontaneous trait inferences, capturing
the inferential processing involved in going from the features of
the social interaction to the final
trait inference. Second, they show how their model can provide an
account of Trope's (1986) two
stage model of dispositional inference, and in particular how it can
capture the impact of higher
level concepts on the identification of social actions.
Kashima, Woolcock, and King
use an architecture that is fairly novel in this literature,
the tensor product model. However, the central issues they address,
the representation of social
categories and stereotypes, overlap with those of Smith and
DeCoster.
Kashima et al. note that since
little work has been done specifying the details of the
representation of social groups, they intend their model as a step
toward addressing that issue.
Further, they note that the little work that has been done has taken
two divergent paths: one looking
at how impressions of groups are formed, and the other at how
individuals are classified into social
groupings, that is, how social categories are represented. The aim
of their chapter is to present a
model that can explain the findings in both of these areas.
They first present a mechanism for
how memories are initially encoded and then examine
how those memories can be used for judgment and memory retrieval.
Their model uses a
distributed representation in which a given feature is represented
as a pattern of activation over a set
of nodes. One unique characteristic of the model is that it provides
a mechanism for the
representation of attribute -
value pairs (or what Kashima et al. call aspects and features), such
as
skin-color: black, or eye-color: blue. For example, assume we have
an individual John with an
attribute, skin-color, that has a value black. Representing this
notion of an attribute which applies
to an object and has a particular value, is difficult to do in
standard connectionist models that use
distributed representations. For example, a typical connectionist
model with a distributed
representation would directly associate the individual John with
black skin. This is because the
standard representation is in terms of a two-dimensional weight
matrix that gives the association of
two vectors. There is no easy way to represent the idea of an
attribute that can take on multiple
values. Thus, one could not easily ask the model, "What is John's
skin color."
Let us see how this works. Assume
that we have two features, each represented by a vector,
a and b. Multiplying the two vectors together (taking the outer
product) gives a matrix, where the
elements in the matrix represent the degree of association between
each element in a and each
element in b. The tensor formulation is a generalization of this to
the association among n vectors.
Thus, if we had a third vector c, we would multiply a, b, and c and
end up with a three
dimensional array that represents all the associations among all the
elements in each of the three
vectors. In this representation, one vector can represent John, a
second vector can represent the
attribute skin color, and the third vector can represent the value,
black. And the resulting three
dimensional array represents the association among John, skin color
and black. Once one has this
array, one could then do the equivalent of asking for John's skin
color, by taking the two-
dimensional matrix representing the association among John and skin
color, and then apply it to the
three dimensional array with appropriate mathematical manipulations
to retrieve the third vector
representing black.
In this model different memory
traces are superimposed on each other by simply adding
together the tensor products for different memories. Thus, one ends
up with one array in which
are superimposed a large number of memories.
Kashima et al. then apply their
model to several phenomena. First, they demonstrate how
characteristics of the group can be used to retrieve a category or
group label. As is true of other
models, such as Smith and DeCoster's, provision of a partial pattern
of cues enables the retrieval
of the entire pattern, although the mechanism by which this happens
is somewhat different than in
Smith and DeCoster's model.
Second, they show how this model can
simulate the use of both exemplars and prototypes in
classification. As part of this demonstration, they show
analytically how the Tensor Product
model is consistent with various Context Model theories of
classification, first proposed by Medin
and Schaffer (1978) and extended by Nosofsky (1984, 1986). These are
exemplar based models
which argue that classification of items into a category is based on
similarity to exemplars that
make up the category. Further, they demonstrate that their model can
simulate results of
experiments by Smith and Zarate (1990) supporting a mixture model of
classification that seem to
show that subjects can use both prototypes and exemplars to classify
new items, depending upon
the experimental conditions.
Third, they show how this model can
simulate judgments or impressions of a group.
Essentially, they provide a vector representation of the high and
low end points of a judgment scale
and then calculate the similarity of that vector to the
representation of the group. In doing this, they
note that judgments of groups seem to fit a weighted averaging model
and they show how their
model can successfully simulate this. Fourth, they show how the
Tensor Product Model can
handle Hamilton and Gifford's (1976) work on the distinctive based
illusory correlation
phenomena.
In concluding, they argue that their
model has the advantage of capturing both classification
and judgment in the same model. And it is consistent with major
models of classification, such as
GCM, and major findings in judgment, such as weighted averaging.
Smith and DeCoster apply a
recurrent connectionist network, specifically an
autoassociative network developed by McClelland and Rumelhart
(1986), to key findings in person
perception and stereotyping.
In an autoassociative model, each unit is linked to every other unit
and receives activation from all other units, as well as receiving
external input. They use a
distributed representation in which a pattern of activation across a
set of units represents a concept,
rather than having a single node correspond to a single concept.
Such a model can do pattern
learning, pattern completion of incomplete patterns, and memory
reconstruction or schematic
processing.
Learning in their model is
instantiated by the delta rule (Widrow & Hoff, 1960) which uses
the difference between the activation of nodes due to internal
inputs from the network and the
activation due to external inputs, to adjust the weights. The aim of
this procedure is to modify the
weights so that the activation of each node from all its internal
connections approximates the
activation of each node from external or stimulus input.
Essentially, the network is learning the
pattern of external inputs. One result of this is that the network
will learn to reinstantiate the
complete pattern from partial input.
Smith and DeCoster show how their
model handles four phenomena. First, it can learn
characteristics of individual exemplars or cases and then retrieve
those characteristics from a partial
cue. Second, it can learn a group stereotype or category from
multiple exemplars and then given
partial cues it can retrieve or reconstruct the prototype or
stereotype. As Smith and DeCoster note,
this demonstrates that a single mechanism and a single
representational format, can account for
these two seemingly different phenomena. This is in contrast to most
models in social cognition
that assume very different representational forms for exemplars and
prototypes. Third, the model
can learn multiple knowledge structures in the same network and then
create novel or emergent
structures by combining the existing structures to form a new
structure. This provides a
mechanism for the development of novel or emergent concepts. Classic
schema models seem to
lack a mechanism for combining old concepts to create novel ones.
(Also see Read & Miller, this
volume; Thagard & Kunda, this volume). Finally, they show that
several aspects of construct
accessibility can be captured by such a model, specifically
demonstrating that both recency and
frequency of activation of a concept increases its impact on future
inferences. In addition they
show that spaced patterns will have a greater impact than patterns
that are massed. They do this by
demonstrating that a partial pattern does a better job of re
instantiating a complete pattern when the
original pattern has been recently and/or frequently presented, or
presented in a spaced fashion.
Smith and DeCoster note that they
are able to handle each of these with the same mechanism,
although typical work in social cognition proposes a separate model
for each. Following work by
Rumelhart, Smolensky, et al. (1986) they also observe that such a
model can produce what looks
like schemas and schematic processing despite the lack of any
schematic structures. (Also see Read
and Miller, this volume)
Van Overwalle and Van Rooy
investigate how a simple two layer feedforward network
using delta rule learning, a pattern associator, can simulate
several interesting findings from the
literature on causal learning. Their work extends earlier work by
others, such as Gluck and Bower
(1988a, 1988b) and Shanks (1991, 1993) which has demonstrated that
the classic Rescorla-
Wagner model of animal learning is formally identical to a two layer
(lacking hidden units)
feedforward network that uses delta rule learning to learn new
associations.
They also compare this kind of model
with statistical models, such as Cheng and Novick's
(1990) probabilistic contrast model and show that the connectionist
model is sensitive to factors
that the probabilistic contrast model is not. The basic difference
between statistical models, such as
the probabilistic contrast model, and the connectionist model, is
that the probabilistic contrast
model is sensitive only to relative frequencies of the pairings of
different kinds of events, whereas
the connectionist model is also sensitive to the absolute frequency
of presentation. For instance,
according to the probabilistic contrast model the case in which we
have one instance of the effect
given the cause and no instance of the effect given the absence of
the cause, should be equivalent to
a case where we have five instances of the effect given the cause
and no instances of the effect
given the absence of the cause, because in both instances the
differences between the probabilities
is 1.0. In contrast, the
connectionist model is sensitive to the absolute frequency of
pairing of the
cause and effect. And, they provide evidence that humans have the
same sensitivity.
In addition, following other work
(e.g., Vallee-Tourangeau, Baker, and Mercier, 1994) they
investigate the parallels between effects in the associative
learning literature known as blocking and
conditioned inhibition and the well known phenomena of discounting
and augmenting in the
attribution literature. As part of this work they show that in human
beings the strength of
discounting and augmenting is sensitive to the frequency of
instances, which is consistent with the
predictions of the associative model, but not with the original
version of the probabilistic contrast
model.
Finally, they examine the learning
of multiple causes and they test the ability of various
connectionist models to simulate human responses. They compare the
two layer feedforward
network with Pearce's (1994) configural cue model, and with a
standard three layer
backpropagation network with hidden units. Pearce's model was
explicitly developed to handle
configurations of cues, by assigning a single node to the
configuration, whereas the
backpropagation network should be able, at least in theory, to learn
hidden units that represent a
configuration of cues. The authors find that Pearce's configural cue
model does the best job of
simulating results from human subjects.
Shoda and Mischel use an
autoassociative, recurrent network to tackle a recent
controversy in personality: the apparent paradox between
expectations of stable individual
differences in patterns of personality and the actually obtained,
relatively low, cross-situational
consistency in behavior. Their answer to this apparent paradox has
been two-fold. First, in an
extensive body of research they and their colleagues have
demonstrated that stable situation-
behavior, if-then relationships characterize individuals. That is,
while people may not show
general cross-situational consistency in behavior, they do show
characteristic responses to different
situations. For example, two people may be highly aggressive, but in
response to different
situations. One may be aggressive when dealing with those who try to
dominate them and the
other when someone is weaker than they are. Thus, we cannot ignore
situations in conceptualizing
personality, but must deal with the individual's characteristic
response to situations.
Second, they have used an
autoassociative, recurrent network to investigate whether stable
patterns of relationships among the "cognitive-affective" units
they postulate can give variable
patterns of behavior in response to differing situations. The
different kinds of units they use are:
encodings (categories), expectancies and beliefs, affective
responses, goals and values, and
competencies and self-regulatory plans. In their typical
implementation, a set of feature detectors is
activated by a situation and activation from these feature detectors
then flows to the cognitive
affective units. The pattern of activation from the cognitive
affective units then activates the
behavior node. In their simulations each individual has a stable
pattern of relationships among the
various cognitive-affective units, although the pattern differs
across individuals. Thus, one can
view each individual as having a stable "personality."
They demonstrate that each
individual model shows a consistent pattern of relationships
between the situations and the behavior, although the nature of the
pattern differs for different
individuals. Thus, each individual has a characteristic set of
stable, if-then situation-behavior
relationships. But interestingly, the situation-behavior
relationships are not completely stable, the
impact of the same situation may differ depending upon the recent
activation history of the
network, or what one may think of as the immediately preceding
mental state of the individual.
Finally, they provide a real-world example of the application of the
model to health protective
behavior, specifically breast self-examination.
Shultz and Lepper follow up
on some of their earlier work published in Psychological
Review and use a variant of a Hopfield type network (one type of
single layer autoassociative or
recurrent network) to successfully simulate the results of a number
of different paradigms in the
dissonance literature (e.g., Insufficient justification via
Initiation, Insufficient justification via
Prohibition, Free choice among alternatives). In some cases their
simulation better fits the data
than does the original
dissonance formulation and in one case their simulation leads to a
novel
prediction which they have experimentally verified. Unfortunately,
consistent with the fragility of
the work on selective exposure, they were much less successful in
capturing the results in this
paradigm. As they note, their ability to simulate the results of
most of the major paradigms argues
that such parallel constraint satisfaction models may provide the
basis for theoretical unification
within this field.
There are several particularly
interesting aspects of their model. First, they are able to use
ideas derived from Hopfield's (1982, 1984) notion of the energy of a
system to provide a
quantitative measure of the overall consonance of the system of
beliefs, as well as a measure of the
contribution of each belief to the consonance of the system. This
was not possible in previous
conceptualizations of dissonance. Second, they include the
importance of each cognition as a
parameter in their model. This allows one to explicitly simulate how
dissonance reduction is
affected by the degree of importance and amount of support of
individual cognitions. Third, they
represent each cognition by two negatively linked nodes, where each
node can be treated as
representing one pole of the cognition. Thus, an attitude toward an
activity is represented by the
summed activation of both a positive and negative node. Although the
negative link will tend to
insure that only one of the two nodes is activated, in some cases
both could be simultaneously
activated, indicating ambivalence.
In addition to simulating the
results of the major paradigms, they also examine how their
model fares with other recent research. For instance, researchers
such as Cooper, Zanna, and
Taves (1978) have directly looked at the impact of arousal on
attitude change. They have shown
that when students write a counterattitudinal essay under high
choice, they show the greatest
dissonance effect when given a stimulant and the smallest effect
when given a tranquilizer. Shultz
and Lepper show that their model can simulate the impact of arousal
and they include an interesting
speculation about the relationship between the role of activation in
their model and the impact of
stimulants and tranquilizers on cortical arousal. They also
successfully address the role of the self-
concept in dissonance, including successfully addressing Steele's
(1988) work on self-affirmation
processes.
As do several authors in this volume
they conclude by making a case for the theoretical
unification that can be provided by constraint satisfaction models.
Not only can these kinds of
models handle the dissonance literature, they can also be applied in
a variety of other domains. As
they and others have noted, constraint satisfaction models have been
employed in a wide variety of
domains: belief revision, explanation, comprehension, schema
completion, analogical retrieval and
mapping, content addressable memory storage and retrieval, attitude
change, impression
formation, and cognitive balance.
Ranney and Schank try
something a little different. Rather than focus on using a
particular kind of neural network to address a specific problem,
they decide to tackle some broad
questions, using their work on the importance of explanatory
coherence in thinking. For example,
they take the typical distinction that is often made between
scientific and social thinking and ask
how real this distinction really is. Their answer is: not very.
Based on their work in both social
reasoning and reasoning about physical systems, they argue that
fundamentally, scientific and
social thinking rely on the same mechanisms; in particular,
principles of explanatory coherence
play a central role in both domains.
They also describe some of their
work using their program Convince Me. This is a program,
partially based on Thagard's model of Explanatory Coherence (1989,
1992), that can be used to
uncover people's reasoning about a variety of domains. It can
uncover the individual beliefs, the
explanatory relations among them, and the coherence or consistency
of the set of beliefs.
Moreover, by giving subjects feedback on how consistent their
beliefs are, it can also be used to
encourage people to develop more coherent sets of beliefs. Relevant
to the earlier point, in their
work with this program, there seems little difference in how people
use it to address scientific and
social problems.
Finally, they decide to address a really big question: How do we
decide what are the most
socially significant or important social issues? They use their work
with Convince Me and ideas
about coherence to explore the role of explanatory coherence in
identifying which problems and
issues are most socially significant.
Nowak and Vallacher examine
how complex social dynamics involving interactions
among people in groups can be modeled by neural networks. They argue
that such models can
provide insights into social dynamics and how such dynamics depend
on the connections among
people. As part of their discussion, they first introduce another
class of models, cellular automata,
that have been used to model social dynamics in such social
phenomena as social influence and
attitude change. They then discuss the limitations of these kinds of
models, in particular the rigid
nature of social ties, and then note the advantages of neural
network models. For example, neural
networks can capture negative social relationships, with which
cellular automata have trouble.
Another attraction is their ability to simulate states of
equilibrium; the idea that networks may
evolve to certain states but not others.
As an example, they analyze the
implications of one type of attractor network, a Hopfield
type network where each individual is represented by a node and the
connections among nodes
represent the relations among them. They also take advantage of the
energy function discussed by
Hopfield to capture the notion that such systems can have a number
of potential equilibria which
differ in how good they are, and represent different distributions
of beliefs.
They note that one can investigate
two kinds of dynamics in these models. First, one can
investigate how relations between individuals, such as liking or
influence, affect the development
of attitudes or similar constructs in a social network. This is
equivalent to examining how the links
among nodes influence the change of activation of the nodes over
time. Second, one can
investigate how the opinions of individuals influences the
relationships between them, by using
what we know about learning in such networks. For example, the
Hebbian learning rule states that
if two nodes are positively activated at the same time, then the
weight between them should
increase, whereas if one node is positive and the other negative,
then the weight should decrease.
This is akin to how similarity in opinion between two individuals
can affect their degree of liking.
They also make an interesting set of
observations about how the impact of wider societal
factors, beyond individual relationships, can be captured in such
models. They note that social
influence is rarely the only source of opinion change. Typically, in
society any individual receives
input from a number of other sources, such as media and personal
memory. For any particular
individual these can be treated as essentially random influences. In
neural network terms this can
be viewed as noise. They note that as noise in such a network
increases, up to a certain point, that
the number of equilibrium or stable points decreases. This is akin
to shaking the state of the
system out of the shallower hills and valleys, so that it is more
likely to enter the deeper valleys.
Thus, the larger the random noise, the greater the likelihood of a
small number of ideological
positions. This would seem to suggest that at times of great ferment
or activity in society, the
societal opinion is likely to crystallize into a small number of
ideological positions. However, they
point out that if the amount of noise becomes too high, then all
equilibria disappear; in essence
everyone has their own separate, independent opinion.
Eiser, Claessen, and Loose
are interested in investigating processes of self-organization
in social systems. And like Nowak and Vallacher, they propose using
connectionist models to
investigate processes occurring in groups of individuals, rather
than just looking at intra-individual
processes.
Eiser et al. look at two different
issues and use two different kinds of architectures. First,
they attempt to simulate the development of Cognitive Balance
(Heider, 1946) among a group of
people (rather than within a single individual). They use a fully
recurrent, feedback network in
which each individual's feeling about an impersonal object is
represented by the activation of a
node and the relationship (or amount of liking) between two
individuals is represented by the
weight between the two corresponding nodes. Thus, similarly to Read
and Miller (1994) they treat
Cognitive Balance as a
constraint satisfaction process. Eiser et al. then use this model to
study the
extent to which the development of balance is due to changes in
relationships among individuals
versus changes in how individuals feel about impersonal objects.
They find that, at least in their
particular implementation, changing relationships among individuals
is far more important than is
changing feelings about objects. As they note, this kind of
simulation can be used to extend our
analysis of such theories as Balance.
In a second set of simulations, they
present a hybrid architecture that combines cellular
automata with feedforward, backpropagation networks. In this model,
each individual is
represented by a cell and the internal state of the cell or
individual is represented by the feedforward
network. Rules applied to the cells determine how they "talk" to one
another. Eiser et al. use this
model to study how a group of individuals may come to an agreement
about naming an object in
their environment; that is, it attempts to model communication among
individuals in a social
network. As part of their simulation they study various kinds of
communication rules that
determine who talks to who and how much. Although the model is
interesting and innovative, it
has one flaw. If it is trying to name two or more different objects,
it exhibits what the authors call
"Smurfing behavior." That is, all the objects come to receive
exactly the same name. So in its
current state the model is unable to capture how a group might come
to give different names to
different objects.
Conclusion
Although social psychologists
are just beginning to study the applications of neural network
models to social phenomena, it is clear from the chapters in this
book that they have great potential
for addressing fundamental issues in social psychology. In fact, the
present authors have already
made significant contributions to our understanding of these issues.
We thank the authors for the
strength of their contributions.
References
Anderson, J. A. (1995). An
Introduction to Neural Networks. Cambridge, MA:
Bradford/MIT Press.
Asch, S. E. (1946). Forming
impressions of personality. Journal of Abnormal and
Social Psychology, 41, 258-290.
Bechtel, W., & Abrahamsen, A.
(1991). Connectionism and the mind: An introduction to
parallel processing in networks. Cambridge, MA: Basil Blackwell.
Cheng, P. W., & Novick, L. R.
(1990). A probabilistic contrast model of causal
induction. Journal of Personality and Social Psychology,
58, 545-567.
Cooper, J., Zanna, M. P. & Taves, P.
A. (1978). Arousal as a necessary condition for
attitude change following forced compliance. Journal of
Personality and Social Psychology,
36, 1101-1106.
Festinger, L. (1950). Informal
social communication. Psychological Review, 57,
271-282.
Festinger, L. (1957). A theory of
cognitive dissonance. Evanston, IL: Row, Peterson.
Gluck, M. A., & Bower, G. H.
(1988a). From conditioning to category learning: An
adaptive network model. Journal of Experimental Psychology:
General, 117, 227-247.
Gluck, M. A., & Bower, G. H.
(1988b). Evaluating an adaptive network model of
human learning. Journal of Memory and language, 27,
166-195.
Hamilton, D. L., & Gifford, R. K.
(1976). Illusory correlation in interpersonal
perception: A cognitive basis of stereotypic judgments. Journal
of Experimental Social
Psychology, 12, 392-407.
Heider, F. (1946). Attitudes and
cognitive organization. Journal of Psychology, 21,
107-112.
Heider, F. (1958). The psychology
of interpersonal relations. New York: Wiley.
Hertz, J., Krogh, A., & Palmer, R.
G. (1991). Introduction to the theory of neural
computation. Redwood City, CA: Addison Wesley.
Holyoak, K. J., & Thagard, P.
(1989). Analogical mapping by constraint satisfaction.
Cognitive Science, 13, 295-355.
Holyoak, K. J., & Thagard, P.
(1995). Mental leaps: Analogy in creative thought.
Cambridge, MA: MIT Press/Bradford Books.
Hopfield, J. J. (1982). Neural
networks and physical systems with emergent collective
computational abilities. Proceedings of the National Academy of
Sciences, USA, 79, 2554-
2558.
Hopfield, J. J. (1984). Neurons with
graded responses have collective computational
properties like those of two-state neurons. Proceedings of the
National Academy of Sciences,
USA, 81, 3088-3092.
Kunda, Z., & Thagard, P. (1996).
Forming impressions from stereotypes, traits, and
behaviors: A parallel constraint satisfaction theory.
Psychological Review, 103, 284-308.
Lewin, K. (1935). A dynamic
theory of personality. New York: McGraw-Hill.
Lewin, K. (1947a). Frontiers in
group dynamics: I. Human Relations, 1, 2-38.
Lewin, K. (1947b). Frontiers in
group dynamics: II. Human Relations, 1, 143-153.
McClelland, J. L., & Elman, J. L.
(1986). Interactive processes in speech perception:
The TRACE model. In McClelland, J. L., & Rumelhart, D.E. (Eds.)
Parallel Distributed
Processing: Explorations in the microstructure of cognition.
Vol. 2: Psychological and
Biological Models. (Pp. 58-121). Cambridge, MA: MIT Press/Bradford
Books.
McClelland, J. L., & Rumelhart, D.E.
(1981). An interactive activation model of context
effects in letter perception: part 1. An account of basic findings.
Psychological Review, 88,
375-407.
McClelland, J. L., & Rumelhart, D.E.
(1986). (Eds.). Parallel Distributed Processing:
Explorations in the microstructure of cognition. Vol. 2:
Psychological and Biological Models.
Cambridge, MA: MIT
Press/Bradford Books.
Medin, D. L., & Schaffer, M. M.
(1978). Context theory of classification learning.
Psychological Review, 85, 207-238.
Miller, L. C., & Read, S. J. (1991).
On the coherence of mental models of persons and
relationships: A knowledge structure approach. In F. Fincham & G. J.
O. Fletcher (Eds.),
Cognition in close relationships. (Pp. 69-99). Hillsdale, NJ:
Lawrence Erlbaum Associates,
Inc.
Nosofsky, R. M. (1984). Choice,
similarity, and the context theory of classification.
Journal of Experimental Psychology: Learning, Memory, and
Cognition, 10, 104-114.
Nosofsky, R. M. (1986). Attention,
similarity, and the identification-categorization
relationship. Journal of Experiment Psychology: General,
115, 39-57.
Pearce, J. M. (1994). Similarity and
discrimination: A selective review and a
connectionist model. Psychological Review, 101,
587-607.
Read, S. J., & Marcus-Newhall, A.
(1993). Explanatory coherence in social
explanations: A parallel distributed processing account. Journal
of Personality and Social
Psychology, 65, 429-447.
Read, S. J., & Miller, L.C. (1993).
Rapist or "regular guy": Explanatory coherence in
the construction of mental models of others. Personality and
Social Psychology Bulletin, 19,
526-540.
Read, S.J., & Miller, L.C. (1995).
Stories are fundamental to meaning and memory: For
social creatures, could it be otherwise? In R.S. Wyer, Jr. (Ed.),
Knowledge and Memory: The
Real Story, Advances in Social Cognition, Vol. VIII. (Lead
article by R.C. Schank & R.P.
Abelson, pp. 139-152). Hillsdale, NJ: Lawrence Erlbaum Associates.
Rumelhart, D. E., Smolensky, P.,
McClelland, J. L., & Hinton, G.E. (1986). Schemata
and sequential thought processes in PDP models. In McClelland, J.
L., & Rumelhart, D.E.
(Eds.) Parallel Distributed Processing: Explorations in the
microstructure of cognition. Vol. 2:
Psychological and Biological Models. (Pp. 7-57). Cambridge, MA:
MIT Press/Bradford
Books.
Rumelhart, D.E., & McClelland, J. L.
(1982). An interactive activation model of context
effects in letter perception: Part 2. The contextual enhancement
effect and some tests and
extensions of the model. Psychological Review, 89,
60-94.
Read, S. J., & Miller, L. C. (1994).
Dissonance and balance in belief systems: The
promise of parallel constraint satisfaction processes and
connectionist modeling approaches. In
R. C. Schank & E. J. Langer (Eds.), Beliefs, reasoning, and
decision making: Psycho-logic in
honor of Bob Abelson (pp. 209-235). Hillsdale, NJ: Lawrence
Erlbaum Associates, Inc.
Read, S. J., Vanman, E. J., &
Miller, L. C. (1997). Connectionism, parallel constraint
satisfaction processes, and Gestalt principles: (Re)Introducing
cognitive dynamics to social
psychology. Personality and Social Psychology Review, 1,
26-53.
Rumelhart, D. E., & McClelland, J.
L. (1986). Parallel distributed processing:
Explorations in the microstructure of cognition: Vol. 1.
Foundations. Cambridge, MA: MIT
Press/Bradford.
Shanks, D. R. (1991). Categorization
by a connectionist network. Journal of
Experimental Psychology: Learning, Memory and Cognition, 17,
433-443.
Shanks, D. R. (1993). Human
instrumental learning: A critical review of data and theory.
British Journal of Psychology, 84, 319-354.
Smith, E. R. (1996). What do
connectionism and social psychology offer each other?
Journal of Personality and Social Psychology, 70,
893-912.
Smith, E. R., & Zaraté, M. A.
(1990). Exemplar and prototype use in social
categorisation. Social Cognition, 8, 243-262.
Steele, C. M. (1988). The psychology
of self-affirmation: Sustaining the integrity of the
self. In L. Berkowitz (Ed.), Advances in Experimental Social
Psychology, (Vol. 21, pp. 261-
302.). New York: Academic
Press.
Thagard, P. (1989). Explanatory
coherence. Behavioral and Brain Sciences, 12, 435-
467.
Thagard, P. (1992). Conceptual
revolutions. Princeton: Princeton University Press.
Trope, Y. (1986). Identification and
inferential processes in dispositional attribution.
Psychological Review, 93, 239-257.
Vallée-Tourangeau, F., Baker, A. G.,
& Mercier, P. (1994). Discounting in causality
and covariation judgments. The Quarterly Journal of Experimental
Psychology, 47B, 151-
171.
Widrow, G., & Hoff, M. E. (1960).
Adaptive switching circuits. Institute of Radio
Engineers, Western Electronic Show and Convention, Convention
Record, Part 4, 96-104.
E-Mail
me with
questions and comments