Download Integrating Face and Voice in Person Perception PDF

TitleIntegrating Face and Voice in Person Perception
File Size6.0 MB
Total Pages384
Table of Contents
                            Integrating Face and Voicein Person Perception
Part I: Evolution and Development
Part II: Identity Information
Part III: Affective Information
Part IV: Impairment
Document Text Contents
Page 2

Integrating Face and Voice in Person Perception

Page 192

18510 Integration of Face and Voice During Emotion Perception...

the context of the experiment. The situation is quite different with more complex
audiovisual pairs consisting of speech sounds and lip movements, or facial emo-
tional expressions and affective tones of voice (Campanella & Belin, 2007 ;
De Gelder & Bertelson, 2003 ) . These complex pairings are natural, as they do not
require any training for the perceiver to treat these pairs as such in the laboratory. In
fact, in the course of studying these pairings naturally associated (e.g., an emotional
facial expression combined with an affective tone of voice), the experimenter may
even create conditions allowing pulling them apart and dissociating them (see
De Gelder & Vroomen, 2000 ; Mcgurk & Macdonald, 1976 ) . This is often done in
order to obtain incongruent pairs and compare them with the more natural situation
of congruence. Natural and arbitrary pairs thus seems to pull the researchers in oppo-
site directions, to some extent, and it is plausible to argue that the underlying multi-
sensory integration processes may be different depending on whether the audiovisual
pairs are natural, or rather arbitrary (see Pourtois & de Gelder, 2002 for evidence).

Object-based multisensory perception is widespread in daily environments.
However, there are only a few multisensory objects that have been studied in depth
so far in cognitive sciences. Space perception, language perception, and the percep-
tion of temporal events are three domains of human cognition where multisensory
research has brought valuable insight. In the domain of space perception, many
multisensory or crossmodal effects have been shown previously that all re fl ect our
ability to integrate spatial information when this information is concurrently pro-
vided by the visual and auditory (or proprioceptive or tactile) modality (Driver &
Spence, 1998a, 1998b, 2000 ) . For example, the distance between spatially disparate
auditory and visual stimuli tends to be underestimated with temporally coincident
presentations, a phenomenon known as the ventriloquist effect/illusion (Bermant &
Welch, 1976 ; Bertelson, 1999 ) . Visual capture is another instance found in the spa-
tial domain (Hay, Pick, & Ikeda, 1965 ) . It involves a spatial localization situation in
which the visual information is in con fl ict with that of another modality, namely,
proprioceptive information, and perceived location is determined predominantly by
visual information. Likewise, when speech sounds (syllables) are presented simul-
taneously with incongruent lip movements, subjects report a percept that belongs
neither to the visual modality nor to the auditory one, but that represents either a
fusion or combination between the two inputs (Mcgurk & Macdonald, 1976 ) . These
results indicate that the visual and auditory components of syllables do combine and
this combination translates as a new speech percept. Natural speech perception
therefore provides a compelling case of multisensory integration (Dodd & Campbell,
1987 ) . A third compelling instance or illusion of object-based multisensory integra-
tion is found in the temporal domain and may be seen, to some degree, as a sym-
metric case to that observed with the ventriloquist illusion. Here a visual illusion is
induced by sound (Shams, Kamitani, & Shimojo, 2000 ) . When a single fl ash of light
is accompanied by multiple auditory beeps, the single fl ash is perceived as multiple
fl ashes. This phenomenon is partly consistent with previous behavioral results that
showed that sound can alter the visually perceived direction of motion (Sekuler,
Sekuler, & Lau, 1997 ) . Altogether, these effects suggest that visual perception is
malleable by signals from other sensory modalities, such as auditory perception

Page 193

186 G. Pourtois and M. Dhar

is malleable by signals from other sensory modalities. More generally, the domi-
nance of one modality over the other does not seem therefore to be fi xed or absolute,
but instead may depend upon the context in which crossmodal effects take place.
For space perception, the visual modality dominates over the auditory, and this situ-
ation is reversed during the perception of discrete temporal events (for which the
auditory domain takes the lead on visual cues).

Traditionally, two sets of constraints have been envisaged in the literature
(Bertelson, 1999 ) . The fi rst, referred to as structural factors, primarily concerns the
spatial and temporal properties of the sensory inputs. The other set, often discussed
as cognitive factors, is related to a whole set of higher-level, semantic or attention-
related factors, including the subject’s knowledge of and familiarity with the multi-
sensory situation (Talsma et al., 2010 ) . Structural factors are the ones that have
attracted by far the most attention from researchers in the fi eld of multisensory inte-
gration (see Calvert, Spence, & Stein, 2004 ) . By comparison, the role of cognitive
factors is still underinvestigated, although more recent work has started to explore
the links between selective attention brain mechanisms and multisensory integra-
tion brain processes (see Talsma et al., 2010 ) . However, from a conceptual view-
point, it seems plausible to argue that some additional cognitive or object-based
constraints on multisensory perception actually take place, to prevent the organism
to register many invalid and spurious incidences of multimodality, as solely de fi ned
based on the spatial and temporal coincidences of the visual and auditory inputs.
Yet, there are only a few studies that have addressed this question, and tested to
which extent object-based constraints may in fl uence mechanisms of multisensory
perception (see De Gelder & Bertelson, 2003 ; Pourtois & de Gelder, 2002 ) .

Object-based multisensory perception is a complex issue, since beyond the spatial
and temporal determinants of the input, the nature of the object to perceive may
vary a lot from one condition (or encounter) to another. In this context, one may
consider emotions just as one class of perceptual objects, besides other categories
like speech (i.e., speech sounds presented simultaneously with lipreading information/
lip movements, see Calvert et al., 1997 ; Mcgurk & Macdonald, 1976 ) or space (i.e.,
although spatial localization is determined predominantly by visual cues, the pre-
sentation of concurrent spatial auditory or tactile cues strongly biases and in fl uences
visual spatial localization abilities, see Bertelson, 1999 ; Driver & Spence, 1998a ;
Stein & Meredith, 1993 ) , as reviewed here above. Several objects or dimensions are
actually susceptible to being perceived by multiple sensory channels at the same
time, and therefore, a central (still unanswered) question concerns the existence of
general principles that would govern multisensory perception. Structural factors,
such as temporal and spatial coincidence (see Stein & Meredith, 1993 ) , may be
envisaged as such. On the other hand and contrary to this view, one might postulate
that each domain or object of perception (e.g., emotion, speech, space) actually pos-
sesses its own organization principles and that the overlap between these domains
is fairly limited. Presumably, multisensory perception of emotion most likely shares
some invariance in the basic perceptual mechanisms of audiovisual integration with
these other domains (speech and space perception), while some speci fi city may well
be present, although this question still remains open.

Page 383


Redundancy , 74–83, 85, 88, 89, 195–201
Redundant signal effect (RSE) , 187
Redundant target effect (RTE) , 187
Response latencies (RT)

congruent bimodal stimulus pairs ,
191, 192

distribution , 188
emotional facial expression , 198

RSE. See Redundant signal effect (RSE)
RT. See Response latencies (RT)
RTE. See Redundant target effect (RTE)

Schizophrenia and autism

amygdala , 265
ASD , 262, 263
audiovisual studies , 263, 264
emotional signals , 262
facial and bodily expressions , 264
nonschizophrenic psychotic disorder , 263
visual and audio channel , 263

Schizophrenia, P300
ERB, unfocused

amplitude reduction, N170 , 345
MMN , 345–346
N100 , 344

auditory diminution and ERPs , 334
dysfunctional visuo-spatial

process , 335
frontotemporal atrophy , 334
and neurocognitive impairments , 334
P3a amplitude and ultra-high risk , 334
phenotypic markers , 333
unimodal tasks and neuroimaging , 335
visual and visuo-spatial impairments ,

Serotonintransporter (SLC6A4/5-HTTLPR)

distress recovery , 109
emotional sensitivity , 109
ERP analysis , 109
genetic variation , 110
genotype, emotional stimuli

processing , 106
happy facial expressions , 107
occipital electrodes, brain processing , 107

SOA. See Stimulus onset asynchrony (SOA)
Sound-source identi fi cation

auditory-visual individual recognition , 37
behavioral studies, nonhuman animals ,

32, 36

cross-modal identity representation , 37
preferential looking paradigm, rhesus

monkeys , 37
sex differences, chimpanzees , 37
visual memory , 37

Speaker identi fi cation. See Audiovisual
integration (AVI)

Speech perception , 300, 303, 305, 309, 314
Stimulus onset asynchrony (SOA) , 174,

260, 261
STS. See Superior temporal sulcus (STS)
Superior temporal gyrus (STG) , 218
Superior temporal sulcus (STS) , 227, 228,

233, 241, 244, 245

Temporal synchronization

amodal properties , 83
audio–visual integration , 86
auditory and visual stimulation , 78
human infants neurophysiological

response , 84
infants’ sensitivity , 85
intersensory redundancy hypothesis , 86
stimulus/subject gender , 82

Temporal voice area (TVA) , 122–123,
129, 130

Test of variables of attention (TOVA)
abnormalities , 341

TVA. See Temporal voice area (TVA)

Unimodal vs. audiovisual integration

face categorization
female, green , 141, 142
male, beige , 141, 142

voice categorization
comparison , 142
female, green , 142
male, beige , 142

Ventrolateral prefrontal cortex (VLPFC)

auditory responsive domain , 54–55
face and vocal information , 61
face-responsive neurons , 49
face-responsive “patches” , 51
human neuroimaging studies , 62
representation, VLPFC , 55–59
ventral auditory stream identi fi cation , 53


Page 384


Visual cortex , 152, 172, 174, 212, 215, 218,
229, 257

VLPFC. See Ventrolateral prefrontal cortex

Vocal expression , 10–13, 20, 241, 253–257
emotional expressions , 110
ERP studies and experimental

investigations , 104
infants’ ability , 104
positive and negative, infants , 103

Vocalization processing , 23
auditory projections

analysis, anatomical connections ,
53, 54

lateral belt auditory areas , 53
multisensory area TPO and TAa , 53
rostrocaudal topography , 52
temporoprefrontal connections , 52

auditory responsive domain, VLPFC ,

description , 51
PFC and auditory, NHPs

auditory discrimination tasks , 51

DLPFC neurons , 51
neurophysiological recordings , 51

representation, VLPFC , 55–59
Vocal type representations

auditory-visual events , 38
cross-modal representation , 39
human phonemes, phonological

phenomena , 39
movie clips , 38
in nonhuman animals , 38–39
preferential looking paradigm , 38
synchronization, sounds and movies , 38
temporal synchrony and phonetic

correspondence , 38
Voice. See Face and voice integration, emotion


Within-modality effects

emotional modulation , 212
neural mechanisms , 215–216
P1 mirror , 218


Similer Documents