USC Phonetics and Phonology Group

home

faculty

courses

events

phonetics
laboratory

research
projects

linklist

labs & groups

students
& alumni

USC
linguistics
department

Hosted by the University of Southern California

www.usc.edu


Open USC Phonetics and Phonology Group Home in new window.

Research Projects at the USC Phonetics Laboratory:  Phonetics and Laboratory Phonology


Contents:

Older Projects

Prosody and Articulatory Dynamics in Spoken Language

Dani Byrd,  Shrikanth Narayanan  and Sungbok Lee (EE & Ling)

with
Elliot Saltzman (Haskins Laboratories & Boston University)

and Rebeka Campos, Susie Choi, Jelena Krivokapic, and Daylen Riggs

Funding: NIH DC03172

The long term objective of the proposed research program is to understand how linguistic structure conditions the spatiotemporal realization of articulatory movement during speaking.  As research in speech production becomes more integrated with linguistic theory, it has become increasingly clear that segmental articulation cannot be understood independently of prosodic structure.  Such structure includes, but is not limited to, prominence and phrasal organization, and effects of these high-level prosodic aspects of linguistic structure pervade low-level articulatory behavior.  However, despite the pervasiveness of these effects, only a very few prosodic signatures have been identified at the level of articulatory patterning. This research program investigates the relation between one aspect of prosodic structure—phrasal structure—and the control and coordination of articulation within a dynamical systems model of speech production.  The specific aim of this proposal is to understand how speakers modulate the spatiotemporal organization of oral articulatory gestures as a function of their phrasal positions.  A series of studies are described that fall into three areas:  the kinematic characteristics of speech gestures in the vicinity of phrasal junctures, the categorical versus gradient nature of those junctures as manifested in articulation, and computational modeling of the systematic variability in articulation that occurs at phrase edges.  The specific aims will be pursued using articulatory movement data collected with a magnetometer system and by elaboration of the well-known Task Dynamic computational model of speech production.

Please see Dani Byrd's research statement for more detail.

D. Byrd & E. Saltzman. (1998) Intragestural dynamics of multiple phrasal boundaries. Journal of Phonetics, 26:173-199. [pdf]

D. Byrd, A. Kaun, S. Narayanan, & E. Saltzman. (2000)  Phrasal signatures in articulation. In M. B. Broe and J. B. Pierrehumbert, (Eds.). Papers in Laboratory Phonology V. Cambridge:Cambridge University Press, 70 - 87. [pdf].

D. Byrd & E. Saltzman. (2003) The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics, 31,2, 149-180. [pdf protected by Academic Press] [alternative pdf]

D. Byrd, S. Lee, D. Riggs, and J. Adams. (2005) Interacting effects of syllable and phrase position on consonant articulation. Journal of the Acoustical Society of America 118(6), 3860-3873. [pdf]

This is our magnetometer team:   Narayanan, Byrd, Lee, and (on right) Mah


Photo credit: Stacey Halper

Back to Top


SPAN:  Speech Production and Articulation kNowledge Group
Real-Time Magnetic Resonance Imaging for Speech Production


Faculty:  Shri Narayanan (USC EE, Ling, CS),  Krishna Nayak (USC EE),  Sungbok Lee (USC EE & Ling),  Dani Byrd (USC Ling),  Richard Leahy (EE)


Graduate Students: Abhinav Sethy (EE), Erik Bresch (EE), Stephen Tobin (Ling), Jason Adams (CS MS '06)
Post-doc:  Jon-Fredrik Nielsen (EE)

Undergraduates:  Celeste de Freitas, David Hunt, Nathaniel Go

Funding: NIH R01DC007124

The long term goals of this project are to wed state-of-the-art technology for imaging the vocal tract with a linguistically informed analysis of the speech tasks or goals requisite in the production of spoken language.  Magnetic resonance imaging (MRI) has served as a valuable tool for studying static postures in speech production.  Now, recent improvements in temporal resolution are making it possible to examine the dynamics of vocal tract shaping during fluent speech using MRI.  Our team has developed an approach for MRI image reconstruction rates of 24 images per second, making veridical real-time movies of speech production possible for the first time without X-rays (Narayanan, Nayak, Lee Sethy, & Byrd, to appear), providing us exquisite information about the spatiotemporal properties of speech gestures in both the oral and pharyngeal portions of the vocal tract.  Our long-term goal is to understand the aspects of vocal tract shaping that are critically controlled during speech, both for sounds known to be complex in geometry (e.g., /r/ & sibilant fricatives) and for sounds known to be complex in their temporal structuring (e.g., /l/ & diphthongs).  An understanding of vocal tract shape as a fundamentally dynamic aspect of linguistic organization will do much to add to the field’s current—basically static (i.e., postural & fixed time-point)—approach to describing the production of speech.   An appropriate understanding of how sounds are produced in space and time is fundamentally a phonological question in that it bears directly on the phonological representation of segmental units, a representation that we take to be intrinsically articulatory and dynamic.

The published study below uses spiral k-space acquisitions with a low flip-angle gradient echo pulse sequence on a conventional GE Signa 1.5T CV/i scanner.  This strategy allows for acquisition rates of 8-9 images per second and reconstruction rates of 20-24 images per second, making veridical movies of speech production now possible. Segmental durations, positions, and inter-articulator timing can all be quantitatively evaluated.  Data show clear real-time movements of the lips, tongue, and velum.  Sample movies and data analysis strategies are presented in the JASA article below and at sail.usc.edu/production/rtmri/jasa2004

S. Narayanan, K. Nayak, S. Lee, A. Sethy & D. Byrd (2004) An approach to real-time magnetic resonance imaging for speech production. Journal of the Acoustical Society of America, 115, 1771-1776. [pdf]

E. Bresch, J. Adams, A. Pouzet, S. Lee, D. Byrd, S. Narayanan (to appear). Semi-automatic processing of real-time MR image sequences for speech production studies. 7th International Seminar on Speech Production, Ubatuba, Brazil.  [pdf]

Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans 
Erik Bresch, Jon Nielsen, Krishna Nayak, and Shrikanth Narayanan
J. Acoust. Soc. Am. 120, 1791 (2006) [pdf]

Some of our SPAN team (Narayanan & Nayak, Byrd, Lee [left to right])



Photo credits: Stacey Halper


Real time MRI video phonetics sounds tutorial at tbe SPAN website

Back to Top


An articulatory view of Kinyarwanda coronal harmony
Walker, Rachel, Dani Byrd, and Fidèle Mpiranya
Phonology 25, 499-535, 2008

Funding: NIH DC03172

Coronal harmony in Kinyarwanda causes alveolar fricatives to become postalveolar
preceding a postalveolar fricative within a stem. Alveolar and postalveolar
stops, affricates and palatals block coronal harmony, but the flap and non-coronal
consonants are reported to be transparent. Kinematic data on consonant production
in Kinyarwanda were collected using electromagnetic articulography.
The mean angle for the line defined by receivers placed on the tongue tip and
blade was calculated over the consonant intervals. Mean angle reliably distinguished
alveolar and postalveolar fricatives, with alveolars showing a lower tip
relative to blade. Mean angle during transparent non-coronal consonants showed
a higher tip relative to blade than in contexts without harmony, and the mean
angle during transparent [m] was not significantly different than during postalveolar
fricatives. This is consistent with a model where Kinyarwanda coronal
harmony extends a continuous tip-blade gesture, causing it to be present during
‘transparent’ segments, but without perceptible effect.




Underlying Invariance & Surface Variability in Speech Production: Modeling Phrasal Effects

Dani Byrd (USC) and Elliot Saltzman (Haskins Laboratories & Boston University)
Funding: NIH DC03172

We explore the puzzling juxtaposition of underlying invariance of control and surface variability in performance during speech production, and outline how a dynamical systems approach can contribute to solving this puzzle. Articulatory patterning at phrase edges is used as an example of how the surface expression of underlyingly invariant phonological units can vary in a linguistically principled way. We use computational simulations of these phrase boundary effects as prosodically-induced local temporal slowing. This slowing is generated by dynamical effects on the parameter specification of articulatory gestures. This focus allows us to examine a specific view of how underlying temporal characteristics of linguistic units can be modulated for communicative ends in the production of a particular utterance.

D. Byrd & E. Saltzman. (2003) The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics, 31,2, pp 149-180. [pdf protected by Academic Press]]

Back to Top



Sound Patterns in Language: The Relation Between Vowels and Syllable
Prominence


PI: Rachel Walker
Graduate Research Assistant: Erika Varis

This project investigates how the relative prominence of a
syllable affects the way vowels are pronounced. Syllables with increased
prominence, such as stressed syllables, are often focal in vowel patterns.
For example, many languages only show their full range of vowel distinctions
in stressed syllables and they reduce vowel quality elsewhere. The specific
questions for this study are: what is the range of vowel patterns sensitive
to syllable prominence? and why are only certain relations between vowels
and prominence attested? The project undertakes a typological investigation
of prominence-sensitive vowel patterns, including vowel deletion, vowel
reduction, vowel/consonant metathesis, and vowel assimilation. A central
discovery is that these patterns avert vowel qualities that are expressed
only in a non-prominent syllable. This gives rise to a hypothesis that the
patterns have evolved so as to reduce perceptual difficulty. The connected
theoretical proposal is that the patterns share a mental construct in the
form of constraints that prevent vowel qualities from being expressed in a
non-prominent position alone. A success of this model is its prediction of a
common outcome by diverse processes across languages.

Funded by the Advancing Scholarship in the Humanities and Social Sciences
Initiative, Office of the Provost, University of Southern California

Back to Top


Functional data analysis of prosodic effects on articulatory timing

Sungbok Lee, Dani Byrd,  Jelena Krivokapic, Ben Parrell
Funding: NIH DC03172

An application of functional data analysis (FDA) (Ramsay & Silverman, 1997) for linguistic experimentation is explored. The time-warping function provided by FDA is shown to offer novel advantages in the investigation of articulatory timing. Traditionally, articulatory studies examining the effects of linguistic variables such as prosody on articulatory timing have relied on kinematic landmarks to define speech intervals of interest.  However, we present a novel approach that allows the analysis of the entire, continuous kinematic trajectories obtained in various experimental conditions, specifically, in the presence or absence of a phrase boundary.  FDA time warping functions after alignment of test and reference (control) signals indicate slowing of articulator movement as the speech stream recedes from the phrase boundary. This is a theoretically predicted pattern (Byrd & Saltzman, 2003), which would be more difficult to validate with a traditional interval-based approach.  However, there exists tokens for which FDA is problematic, and some potential remedies are outlined.  Despite certain limitations, generally, FDA is shown to be a useful tool for characterizing timing patterns in linguistic experimentation based on continuous kinematic trajectories.

S. Lee, D. Byrd, and J. Krivokapic (2006). Functional data analysis of prosodic effects on articulatory timing. J. Acoust. Soc. Am. 119, 1666-1671.

Prosodic boundary gestures or “pi‐gestures” (Byrd and Saltzman, J. Phonetics, 2003) have been introduced to model the local slowing or lengthening of articulatory gestures in the vicinity of phrase boundaries. Computational modeling of articulatory dynamics is an important tool in assessing the predicted effects of pi‐gestures of varying boundary strength on constriction gestures in varying contexts. We simulate pi‐gestures within the TaDA task dynamics computational model [Nam and Kim, JASA, 116, 172–185 (2004)] and examine how functional data analysis can provide a tool for connecting articulatory lengthening with underlying pi‐gesture activation strength. Specifically, the model is applied to the articulatory synthesis of two sequences: [CV#CV] and [CVC#CV], where C is bilabial or alveolar. The pi‐gesture’s midpoint is coordinated synchronously with the midpoint of following consonant’s constriction gesture, and pi‐gesture activation strength and duration are manipulated. Results indicate that pi‐gesture activation strength has a much stronger effect on slowing than its duration. The slowing effect is asymmetrical, skewed earlier than the midpoint in the pi‐gesture interval. After removing linear‐time slowing effect (i.e., after length normalization), the slowing effect is slightly stronger in [CV#CV] than in [CVC#CV]. The strength of pi‐gesture also affects spatial articulatory characteristics depending on constriction location and sequential context.


S. Lee, Benjamin Parrell, Dani Byrd.
(2009) Computational modeling of juncture strength using articulatory synthesis of prosodic gestures. Proceedings of the 157th Meeting, Portland, Oregon, 18-22 May 2009.

Back to Top


On the temporal scope of phrase boundary effects
Dani Byrd,  Jelena Krivokapic, Sungbok Lee

Funding:  NIH DC03172

Acoustic lengthening at prosodic boundaries is well explored, and the articulatory bases for this lengthening are becoming better understood. However, the temporal scope of prosodic boundary effects has not been examined in the articulatory domain. The few acoustic studies examining the distribution of lengthening indicate that boundary effects extend from one to three syllables before the boundary, and that effects diminish as distance from the boundary increases. This diminishment is consistent with the pi-gesture model of prosodic influence [Byrd and Saltzman, J. Phonetics 31, 149–180 (2003)]. The present experiment tests the preboundary and postboundary scope of articulatory lengthening at an intonational phrase boundary. Movement-tracking data are used to evaluate durations of consonant closing and opening movements, acceleration durations, and consonant spatial magnitude. Results indicate that prosodic boundary effects exist locally near the phrase boundary in both directions, diminishing in magnitude more remotely for those subjects who exhibit extended effects. Small postboundary effects that are compensatory in direction are also observed.  

D. Byrd, J. Krivokapic, and S. Lee (2006). How far, how long: On the temporal scope of phrase boundary effects. J. Acoust. Soc. Am. 120, 1589-1599.


<>
Cross-linguistic differences in prosodic organization
Emily Nava, Maria Luisa Zubizaretta, Louis Goldstein

NSF Grant BCS-0444088

USC Provost Grant: Advancing Scholarship in the Humanities and Social
Sciences

Languages differ in the realization and distribution of prominent
pitch events at the phrasal level, and we argue that these cross-linguistic
differences also reflect differences in language-specific organization at
the syllable level. The goal of the current project is to establish the
connection between events at the rhythmic level and pitch accents at the
phrasal level by detailing the hierarchy of prosodic organization for two
typologically different languages, Spanish and English. Additionally, the
project includes data from the speech of adult second language learners of
English whose first language is Spanish, which provides insight as to the
relevant stages of acquisition of native-like prosodic proficiency –in turn
contributing to linguistic theory as to the nature of the architecture of
prosodic organization. Data analyzed from the first of two experiments
reveal that English and Spanish differ regarding pitch accent placement for
certain information structure contexts, and that second language learners
initially transfer prosodic patterns from their first language. Data from a
second experiment analyzed using a forced alignment technique revealed that
vowel durations across lexical categories do not differ significantly in
Spanish but are significantly different in English. Additionally, those
second language speakers whose pitch accent placement was native-like in
English also had native-like vowel durations in English. Additional
experiments are currently in progress that examine organization at the foot
level in monolingual English and Spanish and second language English.

E. Nava & M.L. Zubizarreta. (In press). Deconstructing the Nuclear Stress
Algorithm: Evidence from Second Language Speech. The Sound patterns of
Syntax, eds. N. Erteschik-Shir & L. Rochman. Oxford University Press.

E. Nava & M.L. Zubizarreta. 2008. Prosodic Transfer in L2 Speech: Evidence
from Phrasal Prominence and Rhythm. Proceedings of Speech Prosody 2008, ed.
Plinio Barbosa, Sandra Madureira and Cesar Reis. Campinas, Brazil.


At the juncture of prosody, phonology, and phonetics—The interaction of phrasal and syllable structure in shaping the timing of consonant gestures.

Dani Byrd and Susie Choi
Presented at LabPhon 10 Paris 2006
Funding:  NIHDC03172

Over the preceding nine Laboratory Phonology conferences, leading research in linguistic speech experimentation has clearly established that abstract phonological structure is directly reflected in the spatiotemporal details of speech production.  In fact this softening of the clear severance between phonetics and phonology and an appreciation of the complexity of their relationship has been a prominent contribution of the LabPhon tradition.  In recent LabPhons, the study of the phonetics-phonology interface has been extended to the phonetics-prosody interface, and we have, again, seen that abstract structural properties, this time of sentences rather than words, have a complex and rich effect on the articulatory details of speech production (see e.g., LabPhon 5).  This has led to new theoretical conceptions regarding how phrase boundaries should be represented and to new computational models of their realization in speech (e.g., LabPhon 5 & 8, Byrd & Saltzman JPhon 2003).  In the study described below, we move from two areas of laboratory phonology that are reasonably well-explored—the effects of syllable structure and of phrasal structure on articulatory overlap—to a consideration of how these phonological and prosodic structural properties interact in shaping articulatory timing.
    It remains unknown how syllable and phrasal structure interact in determining intra- and intergestural coordination for consonant clusters preceding (in coda position), spanning (heterosyllabic), and following (in onset position) a phrase boundary.  The prosodic gestural model of phrase boundaries (Byrd & Saltzman JPhon 2003) specifically predicts longer durations and less overlap for both consonants of the CC sequence due to its representation of a phrasal juncture as a prosodic gesture extending in time and causing a slowing of the clock that controls the pacing of gestural activation.  However, this approach suggests that the strongest effect will be on the consonant most local to the boundary, i.e., on C2 for codas, C1 for onsets, and both consonants comparably in a heterosyllabic sequence.  It further predicts that the durational and overlap effects will grade with the strength of the boundary.
    We conducted an articulator movement tracking (EMMA) experiment with a design fully crossing:  syllable position of a CC cluster (word-final, cross-word, and word-initial), adjacent/intervening phrase boundary type (word only, intermediate phrase, intonational phrase, and utterance), and three clusters ([sp], [sk], [kl]).  Three subjects participated, and the tongue tip, lip aperture, and tongue rear were tracked for a total of 756 utterances.   Gestural overlap was calculated both as absolute overlap (the time between peaks of C1 and C2) and relative overlap (the proportion of the way through C1 that the C2 peak occurred).
    The results demonstrate significant main effects as well as two-way interactions of boundary type and syllable position on the duration of the individual consonants.  The predictions were supported in that the heterosyllabic and coda sequences showed the larger phrasal effect on C2 duration and the onset and heterosyllabic sequences on C1 duration.  Further, this lengthening graded with relative boundary strength such that four discrete degrees of lengthening are observed for the pooled data, and three or four for the individual subject data.  Both absolute and relative intergestural overlap also exhibited two-way interactions of boundary type and syllable position.  Cluster type was also a significant factor.  In addition to intergestural overlap decreasing for clusters spanning a phrase boundary, clusters situated both before and after a boundary also showed decreases in overlap, though somewhat smaller.  That is, prosodic structure had clear effects on word-internal segment-to-segment timing in words preceding and following a phrase boundary, in addition to timing effects across the boundary.  These overlap changes are not, we show, a simple consequence of individual gestural lengthening.  In fact, these types of timing changes are predicted by simulations within the prosodic gestural model of phrase boundaries.  Finally, onset clusters were less overlapped than codas at every prosodic position (as predicted from early findings at the phrase-medial word-level.)  There is also some suggestion in the data that onset clusters are more ‘resistant’ to prosodic perturbation; this cohesion is suggestive of the gestural molecule structure proposed by Browman and Goldstein (Phonetica 1992) for s-stop clusters.
    In sum, the predictions of the prosodic gesture model regarding individual consonant lengthening in CC clusters are supported:  both consonants lengthen, but the consonant gesture closest to the phrase boundary lengthens more.  These effects grade with juncture strength.  There is an interaction of phrase boundary type and syllable position on CC overlap such that while CC sequences spanning a boundary are most affected, CC sequences preceding and following a phrase juncture are also affected in a way that grades with juncture strength.  This study extends the on-going efforts of the LabPhon community to understand the relation between abstract linguistic structure and low-level phonetic detail to an examination of the interaction of two different types of structure—syllabic and prosodic—in shaping articulatory timing.  Simultaneously, it seeks to test and refine a theoretical model of the prosody-phonetics interface.



Interacting effects of syllable and phrase position on consonant articulation
Dani Byrd, Sungbok Lee, Jason Adams, Daylen Riggs

Funding: NIH DC03172

The complexities of how prosodic structure, both at the phrasal and syllable levels, shapes speech production have begun to be illuminated through studies of articulatory behavior.  The present study contributes to an understanding of prosodic signatures on articulation by examining the joint effects of phrasal and syllable position on the production of consonants.  Articulatory kinematic data were collected for five subjects using electromagnetic articulography (EMA) to record target consonants (labial, labiodental, & tongue tip), located in (1) either syllable final or initial position and (2) either at a phrase edge or phrase-medially.  Spatial and temporal characteristics of the consonantal constriction formation and release were determined based on kinematic landmarks in the articulator velocity profiles.  The results indicate that syllable and phrasal position consistently affect the movement duration; however, effects on displacement were more variable.  For most subjects, the boundary-adjacent portion of the movement (constriction release for a pre-boundary coda and constriction formation for a post-boundary onset) are not differentially affected in terms of phrasal lengthening—both lengthen comparably.

D. Byrd, S. Lee, D. Riggs, J. Adams. (2005) Interacting effects of syllable and phrase position on consonant articulation. Journal of the Acoustical Society of America 118(6), 3860-3873. [pdf]

Back to Top



Prosodic complexity and phrase length as factors in pause duration

<> Jelena Krivokapic

Research on influences on pauses has mainly focused on the impact of syntax, discourse and prosodic structure on the likelihood of pause occurrence and on the impact of syntactic structure on the duration of pauses within an utterance. Very little is known about what factors, apart from syntactic factors, play a role in determining the length of pauses between utterances or phrases. This experiment examines the effect of prosodic structure and phrase length on pause duration. Subjects read 24 English sentences varying along the following parameters: a) the length in syllables of the intonational phrase preceding and following the pause and b) the prosodic structure of the intonational phrase preceding and following the pause, specifically whether or not the intonational phrase branches into smaller phrases. In order to minimize variability due to speech rate and individual differences, speakers read sentences synchronously in dyads (Cummins 2002, Zvonik & Cummins 2002). The results show that length has a significant effect on pause duration both pre- and post-boundary for all dyads and that prosodic complexity has a significant post-boundary effect for some dyads. The possible reasons for the observed pause duration effects and the implications of these results on the question of incrementality in speech production are discussed.

J. Krivokapic. (2004) Prosodic complexity and phrase length as factors in pause duration.  Journal of the Acoustical Society of America, 115(5,2): 2398.

<>Relating the performance and perception of phrasal boundaries
Jelena Krivokapic

This study examines the correspondence between the production of prosodic structure and perceptual judgments regarding prosodic boundary strength.  In the production component of the study, 3 speakers read sentences in which one specific juncture is manipulated (varying in syntax, position in sentence, and phrase length) to elicit phrasal boundaries of differing strengths.  Lengthening and pause duration at the boundary were measured.  The same sentences, in written form, were presented to a second different group of 3 speakers who provided estimates of the strength of the target boundary on a scale of eight degrees.  The results of the production and the estimation portions of the experiment demonstrate significant correlations between the production boundary strength as reflected in durational properties at the juncture and the boundary strength as estimated in the judging task.  These correlations are roughly linear. Further, in both the production and perception domains, a range of boundary strength is exhibited rather than a small discrete set of boundary types.  We also examine whether speakers’ own boundary strength estimates agree with their productions to a greater extent than estimates of other speakers.

Relating the performance and perception of phrasal boundaries Jelena Krivokapic J. Acoust. Soc. Am. 116, 2643 (2004)

Funding: NIH DC0317

Back to Top


A study of articulatory kinematics of American English diphthongs

Sungbok Lee and Dani Byrd

Diphthongs are acoustically characterized by formant movements between initial and final vocalic segments [cf, Holbook and Fairbanks, J. Speech Hear. Res., (1962); Thomas Gay, J. Acoust. Soc. Am., (1968)]. It has been observed that the initial (onset) and final (offset) portions may not correspond to typical monophthongal vowels and that there exists more acoustic variability in the offset portions than in the onset portions. Spectral transition rates have been found to be different from diphthong to diphthong. In the current study, the articulatory properties of American English diphthongs are investigated in tandem with their acoustic properties. We collected both acoustic and articulatory data of five diphthongs and seven monophthongs in [b‐t] context embedded in a carrier sentence using electromagnetic articulography. The data were recorded from two male and one female subjects, and each diphthong was repeated five times. The main analysis focus is on the kinematic characteristics of the three different tongue sensors (tongue tip, tongue dorsum, and tongue rear) in each diphthong production and the comparisons to monophthongs that are close to either the onset portion or the offset portion of the given diphthong.


S. Lee, Dani Byrd.
(2009) A study of articulatory kinematics of American English diphthongs. Proceedings of the 157th Meeting, Portland, Oregon, 18-22 May 2009. [pdf]

Supported by NIH


Looking for the Spanish Imperative Intonation
 
Sergio Robles-Puente

The intonation of both statements (Come la tarta / He eats the cake) and questions (¿Come la tarta? / Does he eat the cake?; ¿Quién come la tarta?/ Who eats the cake? ) in Spanish has been widely studied in the field. However, not much attention has been paid to the intonation of imperatives (¡Come la tarta! / Eat the cake!). Our knowledge about of these kinds of sentences is so limited that their status as primitives when it comes to Spanish intonation has even been called into question.  Moreover, if we take into account that there is
regionally-conditioned intonational variation in other types of utterances like statements or questions, it is also expected that imperative commands may differ notably from Latin and Caribbean varieties to the Peninsular
ones. In order to get a clearer answer to the question of whether there is a Spanish imperative intonation or not, this study collects data not only to make more generalizations about imperatives in Spanish, but to compare the strategies speakers use both within and among the different Peninsular dialects.

Funding: Del Amo Research Award (Summer 2009)



Older Projects


<>The influence of phonemic vowel length on the voicing effect
Rebeka Campos-Astorkiza

This study tests the influence of phonemic vowel length on the realization of the voicing effect, i.e., the phonetic process by which vowels tend to be longer before voiced obstruents than before voiceless ones. The literature on the voicing effect has identified a number of factors that influence the degree of this effect (Hussein 1994), among them the presence of phonemic vowel length (e.g. Keating 1985). However there appear to be no published reports of experiments to test this claim about the influence of contrastive vowel length. In order to test the hypothesis that the presence of phonemic vowel length attenuates the voicing effect, it is necessary to isolate phonemic vowel length from other possible conditioning factors. This can be done by testing a language where length is contrastive for a subset of its vowel qualities, i.e., a language that has some unpaired vowel for the long-short contrast. The prediction is that the vowel without a short/long counterpart will exhibit a stronger voicing effect than vowels part of a long/short contrast. Lithuanian shows such an asymmetrical system. Lithuanian mid vowels lack a contrast for duration; they are always long. Thus, the hypothesis is that the voicing effect will be greater for /e:, o:/ than for the other vowels.
Acoustic data from native speakers of Lithuanian was collected. The stimuli consisted of bisyllabic non-sense words of the shape CV1C1C2V, where V1 could be any of the Lithuanian vowels and the sequence C1C2 was either /kS/ or /gZ/. The results show that the difference in vowel duration before voiced obstruents and before voiceless ones, i.e., the voicing effect, is greatest for /e:/ and /o:/ (p<.05), compared to the other vowels. Our experiment concludes that the vowels unpaired for length (/e:, o:/) are more impacted by the voicing effect. Vowels with a long/short counterpart are influenced to a lesser degree. This supports our hypothesis. More generally, this conclusion provides evidence for the influence of phonemic contrast on phonetic realization, previously discussed in relation to coarticulation (Manuel 1999) and the cues to stress (Berinstein 1978). Furthermore, the asymmetrical Lithuanian system suggests the importance of minimal contrast in the phonological representation. If a vowel differs from another vowel only in length, then it minimally contrasts for length. Our experiment shows that vowels minimally contrastive for length behave differently from vowels that do not minimally contrast for length.


Back to Top


What is raddoppiamento? Length and prosody in Italian
Rebeka Campos-Astorkiza

Raddoppiamento fono-sintattico in Italian has received much attention in the literature. However, the phenomenon seems to be far from explained and understood. Traditionally, Raddoppiamento refers to a lengthening process that affects word-initial consonants that follow a word ending in a stressed vowel. Furthermore, prosodic and syntactic constraints have been posed that prevent this process from taking place (Nespor 1977, 1979). Unfortunately, most of the analyses and conclusions regarding Raddoppiamento instances lack a solid empirical foundation. This project aims at shedding light by introducing considerations about the nature of the segments in the Raddoppiamento environment and different prosodic contexts. We consider not only consonants but also vowels as possible lengthened segments and examine whether their behavior patterns with that of consonants. Second, two different prosodic contexts are considered. The two relevant words are placed either phrase-internally or at the boundary of a phrase. According to previous analyses, Raddoppiamento is unexpected at the phrase boundary (Nespor 1977, Vogel 1977). Lastly, stressed and unstressed environments are tested.

The results showed that lengthening took place in the traditional Raddoppiamento environment, i.e., when word1 ends in stressed vowel and word2 begins with a consonant and there is no intervening boundary between them. On the other hand, when the initial segment in word2 was a vowel, this did not lengthen. This result shows that any attempt at explaining the process must deal with the fact that only consonants are subject to lengthening. As far as the final vowel in word1 is concerned, this was significantly longer when it carried the stress than when it was unstressed. Finally, the presence of a boundary did not block the process categorically. At the phrase juncture, the initial consonant in word2 was significantly longer when the preceding vowel was stressed than when the latter was unstressed. In the view of this empirical evidence, some of the accounts of Raddoppiamento will have to be revisited in order to accommodate the data.

J. Acoust. Soc. Am. 116, 2645 (2004)

Some novel allophonic and phonemic phenomena in Biscayan Basque
Rebeka Campos-Astorkiza

An acoustic study of novel allophonic and phonemic phenomena in the isolate language Basque is presented. The focus is on speakers of the Biscayan dialect. First, Basque shows a spirantization process by which voiced plosives are produced as approximants, particularly intervocalically. Interestingly, we find that Basque /ld/ sequences, where spirantization is not expected [Hualde (1991) Basque Phonology], are realized as a lateral approximant followed by a voiced lateral fricative. Second, in this variety of Basque, the historical three-way contrast among sibilants (two alveolars and one postalveolar) has been reduced to a two-way distinction. The original contrast, still found in other varieties, between a laminal alveolar and an apical alveolar has merged with different results depending on the continuancy of the sibilants. Third, Basque presents a contrast between trill and flap intervocalically. However, elsewhere this is neutralized, and the precise realization of this segment varies from trill to frication. Finally, the Basque five-vowel inventory allows for almost any sequence of two vowels. The same vowel sequence might be a diphthong (tautosyllabic) or a hiatus (heterosyllabic) depending on the lexical item. That is, diphthongs and hiatus are contrastive. 

J. Acoust. Soc. Am. 118, 1901 (2005)


ChAIM (Children and Adults Interacting with Machines)
a past project:  See current CHIMP website at SAIL Lab

Faculty:   Shrikanth Narayanan (USC Elec. Eng.), Dani Byrd (USC Linguistics),  Elaine Andersen (USC Linguistics; USC Neuroscience Program), Alex Potamianos (Bell Labs)

Students: Suzanne Curtin, Laurie Gerber, Alison Bryant, Serdar YeldrimUndergraduate Students (past and present): Sudha Arunachalam (now at U Penn), Dylan Gould, Abe Kazemzadeh (now a grad student at USC), Sonia Khurana

Funding: USC Integrated Multimedia Systems Center Undergraduate Research Project (funded by NSF) and USC Provost's Office Undergraduate Research Programs and Zumberge Interdisciplinary Grant

Enabling spoken language capability as a part of immersive multimedia interfaces adds naturalness and efficiency to human-machine interactions. Two crucial requirements for multimedia human-machine interfaces are robustness (i.e., minimal performance degradation under trying conditions) and adaptabilty (i.e., responding to the user in a customized manner). Undergraduate researchers, mentored by graduate students, are investigating challenges to robustness & adaptability in the field of automatic speech recognition: the challenge of recovering from system errors made while interacting with capable adult users, and the challenge of adapting to child-users. This is undertaken via linguistically-informed data analysis of two large databases of adults and children interacting with a computer via spoken language.

Back to Top

Phonetic foundations of final /s/ patterning in South-Central Castilian Spanish
Ana Sanchez-Muñoz

Numerous studies have shown that in many dialects of Spanish, the phoneme /s/ in final position may have several realizations depending on a variety of factors (e.g. Lipski 1986; Terrell 1979, 1981; Widdison 1995, 1996). However, there are few analyses of dialectal regions such as the area of South-West Central Castilian Spanish, which because of its geographical location does not entirely belong to any of the dialectal areas described for Castilian. This study explores what factors may be having an effect on the different realizations of final /s/. It considers aspiration ([h]), deletion, velarization ([x]), or realization of the sibilant ([s]). Data were collected from six native speakers under a controlled task. This experiment aimed at production of final /-s/ in three grammatical words (los, mis and estos/éstos), taking into account the following factors: 1) The sound that follows the target final /s/ (consonant or vowel); 2) Whether the target word is phrase final or not; 3) Whether the target word carries focal accent or not. The results show that the first two factors are highly significant whereas the third one is not. It is furthermore observed that certain realizations of final /s/ are restricted by the type of sound that follows it, specifically for [x] after /k/. The results show clear patterns of /s/ realization as [s] mainly occurs before vowels and in prepausal position and [ø] mainly before consonants. It is argued that s-lenition in Spanish can be explained in terms of two variations in the gestural score: changes in magnitude and overlap among gestures. These results help understand in greater depth the mechanisms leading to the different realizations of syllable-final /s/ in Spanish as characterized in the proposed hypothesis.

Phonetic Foundations of Final /s/ Patterning  in South Central Castilian Spanish; Ana Sánchez-Muñoz     690-702
In WCCFL 23:Proceedings of the 23rd West Coast Conference on Formal Linguistics; edited by Vineeta Chand, Ann Kelleher, Angelo J. Rodríguez, and Benjamin Schmeiser

Back to Top


A three-way VOT contrast in final position: Data from Armenian
Narineh Hacopian
(now at Inquira)

Standard Eastern Armenian has a three-way VOT contrast among its oral stop series. While it is rare for such a 3-way contrast to be preserved in final position, Standard Eastern Armenian is claimed to do just this (Ladefoged & Maddieson 1996; Vaux 1998; Khachaturian 1988). This phenomenon provides an ideal opportunity to explore how prosodic structure influences the realization of a complex and delicate system of contrast maintained by temporal and saliency distinctions. The cues to this contrast, including VOT, closure duration, and burst amplitude, are examined in a variety of segmental and prosodic environments.  The experiment evaluates effects of intonation phrase final, intermediate phrase final, word final and syllable final positions. We find that the 3-way VOT contrast is maintained in these prosodic domains in which the stop consonants are final. Speakers make significant VOT distinctions among the target consonants within the same boundary condition, and across prosodic conditions, larger prosodic domains exhibit longer VOT values.

(2003) Journal of the International Phonetic Association, 33(1) 51-80. [pdf]


Back to Top


Nasalization, neutralization, and merger in English front vowels
Jerry Liu (now at Google)

In American English, there exists a tense/lax distinction among the front vowels [i]/[I] and [e]/[E].  These often create minimal pairs in a number of different environments, however, before a velar nasal, the contrast is not preserved and is usually considered to have conflated to only the set of lax vowels.  However, Ladefoged (2001) states that for the high front vowels “many younger Americans pronounce ‘sing’ with a vowel closer to that in ‘beat’ rather than to that in ‘bit.’”   An acoustic study was conducted with nine young native Californians to examine whether raising occurs before the velar nasal, as has been anecdotally observed for the front high vowels, and if so, whether it also occurs for the front mid vowels, which have not previously been reported to raise from the lax. The results indicate that these subjects produce a vowel intermediate in formant frequencies (and sometimes, though not always, in duration) between the tense and lax vowel in the velar nasal environment.  Further, this occurs for both the high and mid front vowels. 

J. Acoust. Soc. Am. 116, 2630 (2004)

Back to Top


Durational cues to focus in a scrambling language
Fetiye Karabay

This paper examines the phonetic structure of focus in Turkish. Focus is marked in different ways in different languages. Three most common ways of focus marking are via pitch perturbation, higher intensity and lengthening (O'Shaughnessy 1979, O'Shaughnessy & Allen 1983, Cooper et al 1985, Wells 1986). In many languages such as English, German, and Dutch, duration, amplitude, and pitch combine to give the effect of perceived stress in words that are in focus (Selkirk 1984, 1994; Gussenhoven 1983, 1994). Other languages, such as Danish and Spanish, use only pitch and amplitude as cues to focus, and, in these languages, durational effects play an unimportant role (Noteboom & Kryut 1987, Toledo 1989). Turkish is a scrambling language where the word order is flexible. While basic word order is SOV, almost all 6 permutations are possible in many sentences. In Turkish, broad focus is marked with word order as well as a pitch perturbation, typically a H tone. Generally, the accented word preceding the verb is in focus, unless the verb is in utterance initial position, in which case the verb itself is in focus. In narrow focus, however, the focused word does not immediately precede the verb. An experiment is conducted to examine whether focus in Turkish is marked by durational lengthening in addition to pitch perturbation and word order. The hypothesis that duration does not have an importance in broad focus due to the presence of the very salient pitch and word order cues is investigated. Further, we hypothesize that durational effects may play an important role in narrow focus since it is otherwise not as saliently marked in that it lacks the canonical word order cue. Data is collected from four native Turkish speakers in a controlled task. Stimuli include (i) four sentences, with phonetically identical subjects and objects, in which word order specifies broad focus and (ii) a second parallel set of sentences with narrow focus (i.e. in which the focused noun is not in canonical per-verb position) for comparison. Seven repetitions of each sentence are recorded and digitized. Phonetic lengthening in broad and narrow focus will be measured from the waveforms and statistically evaluated to confirm or refute the experimental hypotheses.

Back to Top


Stress effects on voicing and frication of voiced obstruents in North-Central Spanish
Carolina Gonzalez (now at Florida State University)

Funding: Del Amo Fellowship

In Spanish, /b, d, g/ are usually spirantized to voiced approximants in all syllabic contexts after a continuant sound. However, in North-Central Peninsular Spanish (NCS), spirantization interacts with coda devoicing, yielding voiceless fricatives. In the majority of cases, coda /b, d, g/ occur in stressed syllables. This work examines whether stress is a factor in the likelihood of frication and devoicing of coda /b, d, g/ in this dialect. An acoustic study was conducted of nine native speakers from NCS. These speakers were tested on nonce words with /b, d, g/ in coda position in both stressed and unstressed syllables. Measurements were made of vowel and consonant duration, presence and absence of frication and voicing, and voicing duration. The results show that frication is more likely in stressed syllables than in unstressed syllables. This suggests that in stressed syllables, a higher subglottal pressure produces higher airflow across the glottis, which favors frication. In turn, frication inhibits voicing due to conflicting aerodynamic requirements between the two. We conclude that stress is a factor in spirantization, and that it may indirectly affect the voicing properties of /b, d, g/.

C. Gonzalez. (2002) Phonetic variation in voiced obstruents in North-Central Penninsular Spanish. Journal of the International Phonetic Association.32(1). [
pdf]

Back to Top


Understanding the production of English word stress in speaking
Dani Byrd
Teruhiko Fukaya (now at
Sugiyama Jogakuen University)
Funding: James H. Zumberge Individual Research Grant Proposal Category I: New Faculty Start-Up

This project investigates the relation between the spatiotemporal orchestration of the vocal movements used to produce speech and a particular aspect of linguistic structure—word stress. The proposal describes a study employing a previously collected articulatory database of nearly a thousand spoken productions of words differing only in their stress pattern. This database was collected using a magnetometer system to track articulator movement during speech and includes data on tongue, lip and jaw trajectories for three subjects. The kinematic characteristics of these movements will be analyzed as a function of stress condition and phrasal prominence. The proposed study will profile the manner in which articulatory behavior is shaped by the linguistic dimension of word stress.

Back to Top


An articulatory examination of word-final flapping at phrase edges and interiors
Dani Byrd

Teruhiko Fukaya (now at
Sugiyama Jogakuen University)
Funding: NIH

Formulations of flapping as a symbolic phonological rule suggest clear articulatory differences between flaps and stops, and often offer no overt explanation for why phrase boundaries should block the alternation.  The present study explores the articulatory foundation of the distinction between flaps and non-flaps in word-final position.  We examine kinematic and acoustic data for these articulations in phrase-final and -medial positions and in falling and level stress contours. It is shown that a discrepancy exists between acoustic and articulatory durational patterning while acoustic durations of flaps are shorter than those of non-flaps overall, their articulatory durations are not uniformly so.  It is important to consider multiple potential articulatory sources both spatial and temporal for the acoustic shortness that characterizes flaps, including spatial reduction, temporal articulatory shortening, and changes in intergestural coordination.  The kinematic data indicate that different sources of flap shortness exist for different speakers and different prosodic conditions.  These results imply that flaps are not categorically different from non-flaps in the articulatory domain, as traditional formulations of flapping as a symbolic phonological rule would suggest.  We conclude that the gradient variability in the spatiotemporal patterning of tongue tip constrictions yields acoustic shortening in flaps.
 
T. Fukaya & D. Byrd (2005) An articulatory examination of word-final flapping at phrase edges and interiors. Journal of the International Phonetics Association 35, 1. [pdf]

Back to Top


Coarticulatory information in natural speech stimuli is crucial for infant recognition of syllable sequences
Suzanne Curtin (USC Linguistics; now at U of Pitts.), Toben Mintz (USC Psychology), Dani Byrd (USC Linguistics)

Funding: NIH, SSHC, USC Zumberge Grant

This research was conducted in the USC Language Development Lab (T. Mintz, Director).

Recent research has determined that coarticulatory information in speech provides important cues to early word segmentation. This experiment investigates whether 7-month old infants' ability to recognize a string requires the presence of appropriate coarticulatory information in the speech familiarization stream. Following familiarization to a string of CV syllables, infants were tested to determine if sequences that co-occurred in the familiarization string were preferred over those in which the syllables did not appear adjacently during familiarization. Further, the test phase was conducted so that the items had either appropriate or inappropriate coarticulation information. The results indicate that infants tested on items with appropriate coarticulation listened significantly longer to strings that had appeared during familiarization than to the appropriately coarticulated control strings that never occurred together during familiarization. Interestingly, when presented with inappropriate coarticulation test items, infants showed no preference for previously familiarized strings over the non-co-occurring syllable strings. We conclude that infants are sensitive to coarticulation in recognizing sequences in a speech stream. Furthermore, coarticulatory cues, in combination with other cues to segmentation, greatly enhance recognition of syllable sequences. These results suggest that coarticulation plays an important role in early word segmentation.

S. Curtin, T. H. Mintz, & D. Byrd. (2001) Coarticulatory cues enhance infants' recognition of syllable sequences in speech. BUCLD 25: Proceedings of the 25th Annual Boston University Conference on Language Development. [pdf]

Back to Top


Production ease as a basis for phonological coocccurence restrictions
Rachel Walker

Narineh Hacopian (now at Inquira)

Undergraduate Students: Gayane Arabyan, Mariko Taki

Funding: James H. Zumberge Individual Research Grant Proposal Category I: New Faculty Start-Up

A widespread pattern is the exclusion of similar sound elements in a word is known as a phonological cooccurrence restriction. When word formation produces a structure containing sounds that violate a cooccurrence restriction, the phonological form of the word is altered in one of two ways in order to obey the condition: the prohibited sounds either dissimilate (become less similar) or assimilate (become identical). We are investigating a hypothesis that bans on similar but different elements have a foundation in psycholinguistic processing. By conducting speech error experiments, we will investigate the factors contributing to production ease in words containing consonants that differ in voicing/nasality. The results will have important implications for shaping a crosslinguistic typology of voicing/nasality cooccurrence effects. Explaining phonological cooccurrence restrictions in terms of maximizing production and processing ease is an exciting new interdisciplinary research direction combining linguistics and psychology. This research promises not only to bring new understanding to the study of widespread cooccurrence patterns, but also to mark a significant advance in determining what universal factors underlie properties of human language.

Nasal consonant speech errors: Implications for "similarity" and nasal harmony at a distance
Rachel Walker, Narineh Hacopian, and Mariko Taki ; J. Acoust. Soc. Am. 112, 2359 (2002)

Back to Top

D. Byrd
Copyright © 2009 All rights reserved.
Information in this document is subject to change without notice.


Copyright Notice: The documents distributed here have been provided as a means to ensure timely dissemination of scholarly and technical work to individuals on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.