USC Phonetics and Phonology Group

home

faculty

courses

events

phonetics
laboratory

research
projects

linklist

labs & groups

students
& alumni

USC
linguistics
department

Hosted by the University of Southern California

www.usc.edu


Open USC Phonetics and Phonology Group Home in new window.

Research Projects at the USC Phonetics Laboratory


Contents:

Older Projects

Prosody and Articulatory Dynamics in Spoken Language

Dani Byrd,  Shrikanth Narayanan  and Sungbok Lee (EE & Ling)

with
Elliot Saltzman (Haskins Laboratories & Boston University)

and Rebeka Campos, Susie Choi, Jelena Krivokapic, and Daylen Riggs

Funding: NIH DC03172

The long term objective of the proposed research program is to understand how linguistic structure conditions the spatiotemporal realization of articulatory movement during speaking.  As research in speech production becomes more integrated with linguistic theory, it has become increasingly clear that segmental articulation cannot be understood independently of prosodic structure.  Such structure includes, but is not limited to, prominence and phrasal organization, and effects of these high-level prosodic aspects of linguistic structure pervade low-level articulatory behavior.  However, despite the pervasiveness of these effects, only a very few prosodic signatures have been identified at the level of articulatory patterning. This research program investigates the relation between one aspect of prosodic structure—phrasal structure—and the control and coordination of articulation within a dynamical systems model of speech production.  The specific aim of this proposal is to understand how speakers modulate the spatiotemporal organization of oral articulatory gestures as a function of their phrasal positions.  A series of studies are described that fall into three areas:  the kinematic characteristics of speech gestures in the vicinity of phrasal junctures, the categorical versus gradient nature of those junctures as manifested in articulation, and computational modeling of the systematic variability in articulation that occurs at phrase edges.  The specific aims will be pursued using articulatory movement data collected with a magnetometer system and by elaboration of the well-known Task Dynamic computational model of speech production.

Please see Dani Byrd's research statement for more detail.

D. Byrd & E. Saltzman. (1998) Intragestural dynamics of multiple phrasal boundaries. Journal of Phonetics, 26:173-199. [pdf]

D. Byrd, A. Kaun, S. Narayanan, & E. Saltzman. (2000)  Phrasal signatures in articulation. In M. B. Broe and J. B. Pierrehumbert, (Eds.). Papers in Laboratory Phonology V. Cambridge:Cambridge University Press, 70 - 87. [pdf].

D. Byrd & E. Saltzman. (2003) The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics, 31,2, 149-180. [pdf protected by Academic Press] [alternative pdf]

D. Byrd, S. Lee, D. Riggs, and J. Adams. (2005) Interacting effects of syllable and phrase position on consonant articulation. Journal of the Acoustical Society of America 118(6), 3860-3873. [pdf]

This is our magnetometer team:   Narayanan, Byrd, Lee, and (on right) Mah


Photo credit: Stacey Halper

Back to Top


SPAN:  Speech Production and Articulation kNowledge Group
Real-Time Magnetic Resonance Imaging for Speech Production


Faculty:  Shri Narayanan (USC EE, Ling, CS),  Krishna Nayak (USC EE),  Sungbok Lee (USC EE & Ling),  Dani Byrd (USC Ling),  Richard Leahy (EE)


Graduate Students: Abhinav Sethy (EE), Erik Bresch (EE), Stephen Tobin (Ling), Jason Adams (CS MS '06)
Post-doc:  Jon-Fredrik Nielsen (EE)

Undergraduates:  Celeste de Freitas, David Hunt, Nathaniel Go

Funding: NIH R01DC007124

The long term goals of this project are to wed state-of-the-art technology for imaging the vocal tract with a linguistically informed analysis of the speech tasks or goals requisite in the production of spoken language.  Magnetic resonance imaging (MRI) has served as a valuable tool for studying static postures in speech production.  Now, recent improvements in temporal resolution are making it possible to examine the dynamics of vocal tract shaping during fluent speech using MRI.  Our team has developed an approach for MRI image reconstruction rates of 24 images per second, making veridical real-time movies of speech production possible for the first time without X-rays (Narayanan, Nayak, Lee Sethy, & Byrd, to appear), providing us exquisite information about the spatiotemporal properties of speech gestures in both the oral and pharyngeal portions of the vocal tract.  Our long-term goal is to understand the aspects of vocal tract shaping that are critically controlled during speech, both for sounds known to be complex in geometry (e.g., /r/ & sibilant fricatives) and for sounds known to be complex in their temporal structuring (e.g., /l/ & diphthongs).  An understanding of vocal tract shape as a fundamentally dynamic aspect of linguistic organization will do much to add to the field’s current—basically static (i.e., postural & fixed time-point)—approach to describing the production of speech.   An appropriate understanding of how sounds are produced in space and time is fundamentally a phonological question in that it bears directly on the phonological representation of segmental units, a representation that we take to be intrinsically articulatory and dynamic.

The published study below uses spiral k-space acquisitions with a low flip-angle gradient echo pulse sequence on a conventional GE Signa 1.5T CV/i scanner.  This strategy allows for acquisition rates of 8-9 images per second and reconstruction rates of 20-24 images per second, making veridical movies of speech production now possible. Segmental durations, positions, and inter-articulator timing can all be quantitatively evaluated.  Data show clear real-time movements of the lips, tongue, and velum.  Sample movies and data analysis strategies are presented in the JASA article below and at sail.usc.edu/production/rtmri/jasa2004

S. Narayanan, K. Nayak, S. Lee, A. Sethy & D. Byrd (2004) An approach to real-time magnetic resonance imaging for speech production. Journal of the Acoustical Society of America, 115, 1771-1776. [pdf]

E. Bresch, J. Adams, A. Pouzet, S. Lee, D. Byrd, S. Narayanan (to appear). Semi-automatic processing of real-time MR image sequences for speech production studies. 7th International Seminar on Speech Production, Ubatuba, Brazil.  [pdf]

Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans 
Erik Bresch, Jon Nielsen, Krishna Nayak, and Shrikanth Narayanan
J. Acoust. Soc. Am. 120, 1791 (2006) [pdf]

Some of our SPAN team (Narayanan & Nayak, Byrd, Lee [left to right])



Photo credits: Stacey Halper


Real time MRI video phonetics sounds tutorial at tbe SPAN website

Back to Top


Underlying Invariance & Surface Variability in Speech Production: Modeling Phrasal Effects

Dani Byrd (USC) and Elliot Saltzman (Haskins Laboratories & Boston University)
Funding: NIH DC03172

We explore the puzzling juxtaposition of underlying invariance of control and surface variability in performance during speech production, and outline how a dynamical systems approach can contribute to solving this puzzle. Articulatory patterning at phrase edges is used as an example of how the surface expression of underlyingly invariant phonological units can vary in a linguistically principled way. We use computational simulations of these phrase boundary effects as prosodically-induced local temporal slowing. This slowing is generated by dynamical effects on the parameter specification of articulatory gestures. This focus allows us to examine a specific view of how underlying temporal characteristics of linguistic units can be modulated for communicative ends in the production of a particular utterance.

D. Byrd & E. Saltzman. (2003) The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics, 31,2, pp 149-180. [pdf protected by Academic Press]]

Back to Top


An articulatory view of Kinyarwanda's coronal harmony

Rachel Walker,  Dani Byrd, Fid
èle Mpiranya

Acknowledgments to: Sungbok Lee, Celeste DeFreitas, and Brian Ronge

Funding: Provost's Undergraduate Research Program and NIH DC03172

This paper addresses theoretical issues surrounding coronal harmony through an instrumental study of Kinyarwanda. Retroflex harmony in Kinyarwanda causes alveolar [s, z] to become retroflex when preceding a retroflex fricative within a stem.  Intervening coronal stops, affricates, and palatal consonants block coronal harmony, but the flap and non-coronal consonants are reported to be transparent.  This harmony system bears on a theoretical debate.  Some researchers have suggested that coronal harmony extends a continuous tongue tip-blade gesture (or feature) with the result that the gesture is present during “transparent” segments, but without perceptible effect (e.g. Flemming 1995, Ní Chiosáin & Padgett 1997, Gafos 1999). We refer to this as the Gesture Extension Model.  An alternative scenario posits that harmony causes the tip-blade gesture to be repeated in a harmonizing consonant but does not cause it to be present during intervening segments (e.g. Hansson 2001, Rose & Walker 2004).  We refer to this as the Repeated Gesture Model.  Kinematic data on the production of consonants in Kinyarwanda were collected using electromagnetic articulography.  The mean angle for receivers placed on the tongue tip and blade were calculated over the consonant intervals.  Mean angle reliably distinguished alveolar and retroflex fricatives, with alveolars showing a lower tip relative to blade.  Several issues were explored involving the status of target consonants, blockers, and transparent segments.  Notably, in contexts where [m] and [k] are perceived as transparent, their mean tip-blade angle was significantly different from contexts where harmony did not occur.  Furthermore, mean angle during transparent [m] showed no significant difference from mean angle during retroflex fricatives, suggesting that the tip-blade angle is sustained systematically through transparent consonants but without perceptible effect.  This supports the Gesture Extension Model for coronal harmony in Kinyarwanda.

Back to Top


Functional data analysis of prosodic effects on articulatory timing

Sungbok Lee, Dani Byrd,  Jelena Krivokapic
Funding: NIH DC03172

An application of functional data analysis (FDA) (Ramsay & Silverman, 1997) for linguistic experimentation is explored. The time-warping function provided by FDA is shown to offer novel advantages in the investigation of articulatory timing. Traditionally, articulatory studies examining the effects of linguistic variables such as prosody on articulatory timing have relied on kinematic landmarks to define speech intervals of interest.  However, we present a novel approach that allows the analysis of the entire, continuous kinematic trajectories obtained in various experimental conditions, specifically, in the presence or absence of a phrase boundary.  FDA time warping functions after alignment of test and reference (control) signals indicate slowing of articulator movement as the speech stream recedes from the phrase boundary. This is a theoretically predicted pattern (Byrd & Saltzman, 2003), which would be more difficult to validate with a traditional interval-based approach.  However, there exists tokens for which FDA is problematic, and some potential remedies are outlined.  Despite certain limitations, generally, FDA is shown to be a useful tool for characterizing timing patterns in linguistic experimentation based on continuous kinematic trajectories.

S. Lee, D. Byrd, and J. Krivokapic (2006). Functional data analysis of prosodic effects on articulatory timing. J. Acoust. Soc. Am. 119, 1666-1671.

Back to Top


On the temporal scope of phrase boundary effects
Dani Byrd,  Jelena Krivokapic, Sungbok Lee

Funding:  NIH DC03172

Acoustic lengthening at prosodic boundaries is well explored, and the articulatory bases for this lengthening are becoming better understood. However, the temporal scope of prosodic boundary effects has not been examined in the articulatory domain. The few acoustic studies examining the distribution of lengthening indicate that boundary effects extend from one to three syllables before the boundary, and that effects diminish as distance from the boundary increases. This diminishment is consistent with the pi-gesture model of prosodic influence [Byrd and Saltzman, J. Phonetics 31, 149–180 (2003)]. The present experiment tests the preboundary and postboundary scope of articulatory lengthening at an intonational phrase boundary. Movement-tracking data are used to evaluate durations of consonant closing and opening movements, acceleration durations, and consonant spatial magnitude. Results indicate that prosodic boundary effects exist locally near the phrase boundary in both directions, diminishing in magnitude more remotely for those subjects who exhibit extended effects. Small postboundary effects that are compensatory in direction are also observed.  

D. Byrd, J. Krivokapic, and S. Lee (2006). How far, how long: On the temporal scope of phrase boundary effects. J. Acoust. Soc. Am. 120, 1589-1599.


At the juncture of prosody, phonology, and phonetics—The interaction of phrasal and syllable structure in shaping the timing of consonant gestures.

Dani Byrd and Susie Choi
Presented at LabPhon 10 Paris 2006
Funding:  NIHDC03172

Over the preceding nine Laboratory Phonology conferences, leading research in linguistic speech experimentation has clearly established that abstract phonological structure is directly reflected in the spatiotemporal details of speech production.  In fact this softening of the clear severance between phonetics and phonology and an appreciation of the complexity of their relationship has been a prominent contribution of the LabPhon tradition.  In recent LabPhons, the study of the phonetics-phonology interface has been extended to the phonetics-prosody interface, and we have, again, seen that abstract structural properties, this time of sentences rather than words, have a complex and rich effect on the articulatory details of speech production (see e.g., LabPhon 5).  This has led to new theoretical conceptions regarding how phrase boundaries should be represented and to new computational models of their realization in speech (e.g., LabPhon 5 & 8, Byrd & Saltzman JPhon 2003).  In the study described below, we move from two areas of laboratory phonology that are reasonably well-explored—the effects of syllable structure and of phrasal structure on articulatory overlap—to a consideration of how these phonological and prosodic structural properties interact in shaping articulatory timing.
    It remains unknown how syllable and phrasal structure interact in determining intra- and intergestural coordination for consonant clusters preceding (in coda position), spanning (heterosyllabic), and following (in onset position) a phrase boundary.  The prosodic gestural model of phrase boundaries (Byrd & Saltzman JPhon 2003) specifically predicts longer durations and less overlap for both consonants of the CC sequence due to its representation of a phrasal juncture as a prosodic gesture extending in time and causing a slowing of the clock that controls the pacing of gestural activation.  However, this approach suggests that the strongest effect will be on the consonant most local to the boundary, i.e., on C2 for codas, C1 for onsets, and both consonants comparably in a heterosyllabic sequence.  It further predicts that the durational and overlap effects will grade with the strength of the boundary.
    We conducted an articulator movement tracking (EMMA) experiment with a design fully crossing:  syllable position of a CC cluster (word-final, cross-word, and word-initial), adjacent/intervening phrase boundary type (word only, intermediate phrase, intonational phrase, and utterance), and three clusters ([sp], [sk], [kl]).  Three subjects participated, and the tongue tip, lip aperture, and tongue rear were tracked for a total of 756 utterances.   Gestural overlap was calculated both as absolute overlap (the time between peaks of C1 and C2) and relative overlap (the proportion of the way through C1 that the C2 peak occurred).
    The results demonstrate significant main effects as well as two-way interactions of boundary type and syllable position on the duration of the individual consonants.  The predictions were supported in that the heterosyllabic and coda sequences showed the larger phrasal effect on C2 duration and the onset and heterosyllabic sequences on C1 duration.  Further, this lengthening graded with relative boundary strength such that four discrete degrees of lengthening are observed for the pooled data, and three or four for the individual subject data.  Both absolute and relative intergestural overlap also exhibited two-way interactions of boundary type and syllable position.  Cluster type was also a significant factor.  In addition to intergestural overlap decreasing for clusters spanning a phrase boundary, clusters situated both before and after a boundary also showed decreases in overlap, though somewhat smaller.  That is, prosodic structure had clear effects on word-internal segment-to-segment timing in words preceding and following a phrase boundary, in addition to timing effects across the boundary.  These overlap changes are not, we show, a simple consequence of individual gestural lengthening.  In fact, these types of timing changes are predicted by simulations within the prosodic gestural model of phrase boundaries.  Finally, onset clusters were less overlapped than codas at every prosodic position (as predicted from early findings at the phrase-medial word-level.)  There is also some suggestion in the data that onset clusters are more ‘resistant’ to prosodic perturbation; this cohesion is suggestive of the gestural molecule structure proposed by Browman and Goldstein (Phonetica 1992) for s-stop clusters.
    In sum, the predictions of the prosodic gesture model regarding individual consonant lengthening in CC clusters are supported:  both consonants lengthen, but the consonant gesture closest to the phrase boundary lengthens more.  These effects grade with juncture strength.  There is an interaction of phrase boundary type and syllable position on CC overlap such that while CC sequences spanning a boundary are most affected, CC sequences preceding and following a phrase juncture are also affected in a way that grades with juncture strength.  This study extends the on-going efforts of the LabPhon community to understand the relation between abstract linguistic structure and low-level phonetic detail to an examination of the interaction of two different types of structure—syllabic and prosodic—in shaping articulatory timing.  Simultaneously, it seeks to test and refine a theoretical model of the prosody-phonetics interface.



Interacting effects of syllable and phrase position on consonant articulation
Dani Byrd, Sungbok Lee, Jason Adams, Daylen Riggs

Funding: NIH DC03172

The complexities of how prosodic structure, both at the phrasal and syllable levels, shapes speech production have begun to be illuminated through studies of articulatory behavior.  The present study contributes to an understanding of prosodic signatures on articulation by examining the joint effects of phrasal and syllable position on the production of consonants.  Articulatory kinematic data were collected for five subjects using electromagnetic articulography (EMA) to record target consonants (labial, labiodental, & tongue tip), located in (1) either syllable final or initial position and (2) either at a phrase edge or phrase-medially.  Spatial and temporal characteristics of the consonantal constriction formation and release were determined based on kinematic landmarks in the articulator velocity profiles.  The results indicate that syllable and phrasal position consistently affect the movement duration; however, effects on displacement were more variable.  For most subjects, the boundary-adjacent portion of the movement (constriction release for a pre-boundary coda and constriction formation for a post-boundary onset) are not differentially affected in terms of phrasal lengthening—both lengthen comparably.

D. Byrd, S. Lee, D. Riggs, J. Adams. (2005) Interacting effects of syllable and phrase position on consonant articulation. Journal of the Acoustical Society of America 118(6), 3860-3873. [pdf]

Back to Top



Prosodic complexity and phrase length as factors in pause duration
Jelena Krivokapic

Research on influences on pauses has mainly focused on the impact of syntax, discourse and prosodic structure on the likelihood of pause occurrence and on the impact of syntactic structure on the duration of pauses within an utterance. Very little is known about what factors, apart from syntactic factors, play a role in determining the length of pauses between utterances or phrases. This experiment examines the effect of prosodic structure and phrase length on pause duration. Subjects read 24 English sentences varying along the following parameters: a) the length in syllables of the intonational phrase preceding and following the pause and b) the prosodic structure of the intonational phrase preceding and following the pause, specifically whether or not the intonational phrase branches into smaller phrases. In order to minimize variability due to speech rate and individual differences, speakers read sentences synchronously in dyads (Cummins 2002, Zvonik & Cummins 2002). The results show that length has a significant effect on pause duration both pre- and post-boundary for all dyads and that prosodic complexity has a significant post-boundary effect for some dyads. The possible reasons for the observed pause duration effects and the implications of these results on the question of incrementality in speech production are discussed.

J. Krivokapic. (2004) Prosodic complexity and phrase length as factors in pause duration.  Journal of the Acoustical Society of America, 115(5,2): 2398.

<>Relating the performance and perception of phrasal boundaries
Jelena Krivokapic
This study examines the correspondence between the production of prosodic structure and perceptual judgments regarding prosodic boundary strength.  In the production component of the study, 3 speakers read sentences in which one specific juncture is manipulated (varying in syntax, position in sentence, and phrase length) to elicit phrasal boundaries of differing strengths.  Lengthening and pause duration at the boundary were measured.  The same sentences, in written form, were presented to a second different group of 3 speakers who provided estimates of the strength of the target boundary on a scale of eight degrees.  The results of the production and the estimation portions of the experiment demonstrate significant correlations between the production boundary strength as reflected in durational properties at the juncture and the boundary strength as estimated in the judging task.  These correlations are roughly linear. Further, in both the production and perception domains, a range of boundary strength is exhibited rather than a small discrete set of boundary types.  We also examine whether speakers’ own boundary strength estimates agree with their productions to a greater extent than estimates of other speakers.

Relating the performance and perception of phrasal boundaries Jelena Krivokapic J. Acoust. Soc. Am. 116, 2643 (2004)

Funding: NIH DC0317

Back to Top


<>The influence of phonemic vowel length on the voicing effect
Rebeka Campos-Astorkiza

This study tests the influence of phonemic vowel length on the realization of the voicing effect, i.e., the phonetic process by which vowels tend to be longer before voiced obstruents than before voiceless ones. The literature on the voicing effect has identified a number of factors that influence the degree of this effect (Hussein 1994), among them the presence of phonemic vowel length (e.g. Keating 1985). However there appear to be no published reports of experiments to test this claim about the influence of contrastive vowel length. In order to test the hypothesis that the presence of phonemic vowel length attenuates the voicing effect, it is necessary to isolate phonemic vowel length from other possible conditioning factors. This can be done by testing a language where length is contrastive for a subset of its vowel qualities, i.e., a language that has some unpaired vowel for the long-short contrast. The prediction is that the vowel without a short/long counterpart will exhibit a stronger voicing effect than vowels part of a long/short contrast. Lithuanian shows such an asymmetrical system. Lithuanian mid vowels lack a contrast for duration; they are always long. Thus, the hypothesis is that the voicing effect will be greater for /e:, o:/ than for the other vowels.
Acoustic data from native speakers of Lithuanian was collected. The stimuli consisted of bisyllabic non-sense words of the shape CV1C1C2V, where V1 could be any of the Lithuanian vowels and the sequence C1C2 was either /kS/ or /gZ/. The results show that the difference in vowel duration before voiced obstruents and before voiceless ones, i.e., the voicing effect, is greatest for /e:/ and /o:/ (p<.05), compared to the other vowels. Our experiment concludes that the vowels unpaired for length (/e:, o:/) are more impacted by the voicing effect. Vowels with a long/short counterpart are influenced to a lesser degree. This supports our hypothesis. More generally, this conclusion provides evidence for the influence of phonemic contrast on phonetic realization, previously discussed in relation to coarticulation (Manuel 1999) and the cues to stress (Berinstein 1978). Furthermore, the asymmetrical Lithuanian system suggests the importance of minimal contrast in the phonological representation. If a vowel differs from another vowel only in length, then it minimally contrasts for length. Our experiment shows that vowels minimally contrastive for length behave differently from vowels that do not minimally contrast for length.


Back to Top



Older Projects


What is raddoppiamento? Length and prosody in Italian
Rebeka Campos-Astorkiza

Raddoppiamento fono-sintattico in Italian has received much attention in the literature. However, the phenomenon seems to be far from explained and understood. Traditionally, Raddoppiamento refers to a lengthening process that affects word-initial consonants that follow a word ending in a stressed vowel. Furthermore, prosodic and syntactic constraints have been posed that prevent this process from taking place (Nespor 1977, 1979). Unfortunately, most of the analyses and conclusions regarding Raddoppiamento instances lack a solid empirical foundation. This project aims at shedding light by introducing considerations about the nature of the segments in the Raddoppiamento environment and different prosodic contexts. We consider not only consonants but also vowels as possible lengthened segments and examine whether their behavior patterns with that of consonants. Second, two different prosodic contexts are considered. The two relevant words are placed either phrase-internally or at the boundary of a phrase. According to previous analyses, Raddoppiamento is unexpected at the phrase boundary (Nespor 1977, Vogel 1977). Lastly, stressed and unstressed environments are tested.

The results showed that lengthening took place in the traditional Raddoppiamento environment, i.e., when word1 ends in stressed vowel and word2 begins with a consonant and there is no intervening boundary between them. On the other hand, when the initial segment in word2 was a vowel, this did not lengthen. This result shows that any attempt at explaining the process must deal with the fact that only consonants are subject to lengthening. As far as the final vowel in word1 is concerned, this was significantly longer when it carried the stress than when it was unstressed. Finally, the presence of a boundary did not block the process categorically. At the phrase juncture, the initial consonant in word2 was significantly longer when the preceding vowel was stressed than when the latter was unstressed. In the view of this empirical evidence, some of the accounts of Raddoppiamento will have to be revisited in order to accommodate the data.

J. Acoust. Soc. Am. 116, 2645 (2004)

Some novel allophonic and phonemic phenomena in Biscayan Basque
Rebeka Campos-Astorkiza

An acoustic study of novel allophonic and phonemic phenomena in the isolate language Basque is presented. The focus is on speakers of the Biscayan dialect. First, Basque shows a spirantization process by which voiced plosives are produced as approximants, particularly intervocalically. Interestingly, we find that Basque /ld/ sequences, where spirantization is not expected [Hualde (1991) Basque Phonology], are realized as a lateral approximant followed by a voiced lateral fricative. Second, in this variety of Basque, the historical three-way contrast among sibilants (two alveolars and one postalveolar) has been reduced to a two-way distinction. The original contrast, still found in other varieties, between a laminal alveolar and an apical alveolar has merged with different results depending on the continuancy of the sibilants. Third, Basque presents a contrast between trill and flap intervocalically. However, elsewhere this is neutralized, and the precise realization of this segment varies from trill to frication. Finally, the Basque five-vowel inventory allows for almost any sequence of two vowels. The same vowel sequence might be a diphthong (tautosyllabic) or a hiatus (heterosyllabic) depending on the lexical item. That is, diphthongs and hiatus are contrastive. 

J. Acoust. Soc. Am. 118, 1901 (2005)


ChAIM (Children and Adults Interacting with Machines)
a past project:  See current CHIMP website at SAIL Lab

Faculty:   Shrikanth Narayanan (USC Elec. Eng.), Dani Byrd (USC Linguistics),  Elaine Andersen (USC Linguistics; USC Neuroscience Program), Alex Potamianos (Bell Labs)

Students: Suzanne Curtin, Laurie Gerber, Alison Bryant, Serdar YeldrimUndergraduate Students (past and present): Sudha Arunachalam (now at U Penn), Dylan Gould, Abe Kazemzadeh (now a grad student at USC), Sonia Khurana

Funding: USC Integrated Multimedia Systems Center Undergraduate Research Project (funded by NSF) and USC Provost's Office Undergraduate Research Programs and Zumberge Interdisciplinary Grant

Enabling spoken language capability as a part of immersive multimedia interfaces adds naturalness and efficiency to human-machine interactions. Two crucial requirements for multimedia human-machine interfaces are robustness (i.e., minimal performance degradation under trying conditions) and adaptabilty (i.e., responding to the user in a customized manner). Undergraduate researchers, mentored by graduate students, are investigating challenges to robustness & adaptability in the field of automatic speech recognition: the challenge of recovering from system errors made while interacting with capable adult users, and the challenge of adapting to child-users. This is undertaken via linguistically-informed data analysis of two large databases of adults and children interacting with a computer via spoken language.

Back to Top

Phonetic foundations of final /s/ patterning in South-Central Castilian Spanish
Ana Sanchez-Muñoz

Numerous studies have shown that in many dialects of Spanish, the phoneme /s/ in final position may have several realizations depending on a variety of factors (e.g. Lipski 1986; Terrell 1979, 1981; Widdison 1995, 1996). However, there are few analyses of dialectal regions such as the area of South-West Central Castilian Spanish, which because of its geographical location does not entirely belong to any of the dialectal areas described for Castilian. This study explores what factors may be having an effect on the different realizations of final /s/. It considers aspiration ([h]), deletion, velarization ([x]), or realization of the sibilant ([s]). Data were collected from six native speakers under a controlled task. This experiment aimed at production of final /-s/ in three grammatical words (los, mis and estos/éstos), taking into account the following factors: 1) The sound that follows the target final /s/ (consonant or vowel); 2) Whether the target word is phrase final or not; 3) Whether the target word carries focal accent or not. The results show that the first two factors are highly significant whereas the third one is not. It is furthermore observed that certain realizations of final /s/ are restricted by the type of sound that follows it, specifically for [x] after /k/. The results show clear patterns of /s/ realization as [s] mainly occurs before vowels and in prepausal position and [ø] mainly before consonants. It is argued that s-lenition in Spanish can be explained in terms of two variations in the gestural score: changes in magnitude and overlap among gestures. These results help understand in greater depth the mechanisms leading to the different realizations of syllable-final /s/ in Spanish as characterized in the proposed hypothesis.

Phonetic Foundations of Final /s/ Patterning  in South Central Castilian Spanish; Ana Sánchez-Muñoz     690-702
In WCCFL 23:Proceedings of the 23rd West Coast Conference on Formal Linguistics; edited by Vineeta Chand, Ann Kelleher, Angelo J. Rodríguez, and Benjamin Schmeiser

Back to Top


A three-way VOT contrast in final position: Data from Armenian
Narineh Hacopian
(now at Inquira)

Standard Eastern Armenian has a three-way VOT contrast among its oral stop series. While it is rare for such a 3-way contrast to be preserved in final position, Standard Eastern Armenian is claimed to do just this (Ladefoged & Maddieson 1996; Vaux 1998; Khachaturian 1988). This phenomenon provides an ideal opportunity to explore how prosodic structure influences the realization of a complex and delicate system of contrast maintained by temporal and saliency distinctions. The cues to this contrast, including VOT, closure duration, and burst amplitude, are examined in a variety of segmental and prosodic environments.  The experiment evaluates effects of intonation phrase final, intermediate phrase final, word final and syllable final positions. We find that the 3-way VOT contrast is maintained in these prosodic domains in which the stop consonants are final. Speakers make significant VOT distinctions among the target consonants within the same boundary condition, and across prosodic conditions, larger prosodic domains exhibit longer VOT values.

(2003) Journal of the International Phonetic Association, 33(1) 51-80. [pdf]


Back to Top


Nasalization, neutralization, and merger in English front vowels
Jerry Liu (now at Google)

In American English, there exists a tense/lax distinction among the front vowels [i]/[I] and [e]/[E].  These often create minimal pairs in a number of different environments, however, before a velar nasal, the contrast is not preserved and is usually considered to have conflated to only the set of lax vowels.  However, Ladefoged (2001) states that for the high front vowels “many younger Americans pronounce ‘sing’ with a vowel closer to that in ‘beat’ rather than to that in ‘bit.’”   An acoustic study was conducted with nine young native Californians to examine whether raising occurs before the velar nasal, as has been anecdotally observed for the front high vowels, and if so, whether it also occurs for the front mid vowels, which have not previously been reported to raise from the lax. The results indicate that these subjects produce a vowel intermediate in formant frequencies (and sometimes, though not always, in duration) between the tense and lax vowel in the velar nasal environment.  Further, this occurs for both the high and mid front vowels. 

J. Acoust. Soc. Am. 116, 2630 (2004)

Back to Top


Durational cues to focus in a scrambling language
Fetiye Karabay

This paper examines the phonetic structure of focus in Turkish. Focus is marked in different ways in different languages. Three most common ways of focus marking are via pitch perturbation, higher intensity and lengthening (O'Shaughnessy 1979, O'Shaughnessy & Allen 1983, Cooper et al 1985, Wells 1986). In many languages such as English, German, and Dutch, duration, amplitude, and pitch combine to give the effect of perceived stress in words that are in focus (Selkirk 1984, 1994; Gussenhoven 1983, 1994). Other languages, such as Danish and Spanish, use only pitch and amplitude as cues to focus, and, in these languages, durational effects play an unimportant role (Noteboom & Kryut 1987, Toledo 1989). Turkish is a scrambling language where the word order is flexible. While basic word order is SOV, almost all 6 permutations are possible in many sentences. In Turkish, broad focus is marked with word order as well as a pitch perturbation, typically a H tone. Generally, the accented word preceding the verb is in focus, unless the verb is in utterance initial position, in which case the verb itself is in focus. In narrow focus, however, the focused word does not immediately precede the verb. An experiment is conducted to examine whether focus in Turkish is marked by durational lengthening in addition to pitch perturbation and word order. The hypothesis that duration does not have an importance in broad focus due to the presence of the very salient pitch and word order cues is investigated. Further, we hypothesize that durational effects may play an important role in narrow focus since it is otherwise not as saliently marked in that it lacks the canonical word order cue. Data is collected from four native Turkish speakers in a controlled task. Stimuli include (i) four sentences, with phonetically identical subjects and objects, in which word order specifies broad focus and (ii) a second parallel set of sentences with narrow focus (i.e. in which the focused noun is not in canonical per-verb position) for comparison. Seven repetitions of each sentence are recorded and digitized. Phonetic lengthening in broad and narrow focus will be measured from the waveforms and statistically evaluated to confirm or refute the experimental hypotheses.

Back to Top


Stress effects on voicing and frication of voiced obstruents in North-Central Spanish
Carolina Gonzalez (now at Florida State University)

Funding: Del Amo Fellowship

In Spanish, /b, d, g/ are usually spirantized to voiced approximants in all syllabic contexts after a continuant sound. However, in North-Central Peninsular Spanish (NCS), spirantization interacts with coda devoicing, yielding voiceless fricatives. In the majority of cases, coda /b, d, g/ occur in stressed syllables. This work examines whether stress is a factor in the likelihood of frication and devoicing of coda /b, d, g/ in this dialect. An acoustic study was conducted of nine native speakers from NCS. These speakers were tested on nonce words with /b, d, g/ in coda position in both stressed and unstressed syllables. Measurements were made of vowel and consonant duration, presence and absence of frication and voicing, and voicing duration. The results show that frication is more likely in stressed syllables than in unstressed syllables. This suggests that in stressed syllables, a higher subglottal pressure produces higher airflow across the glottis, which favors frication. In turn, frication inhibits voicing due to conflicting aerodynamic requirements between the two. We conclude that stress is a factor in spirantization, and that it may indirectly affect the voicing properties of /b, d, g/.

C. Gonzalez. (2002) Phonetic variation in voiced obstruents in North-Central Penninsular Spanish. Journal of the International Phonetic Association.32(1). [
pdf]

Back to Top


Understanding the production of English word stress in speaking
Dani Byrd
Teruhiko Fukaya (now at
Sugiyama Jogakuen University)
Funding: James H. Zumberge Individual Research Grant Proposal Category I: New Faculty Start-Up

This project investigates the relation between the spatiotemporal orchestration of the vocal movements used to produce speech and a particular aspect of linguistic structure—word stress. The proposal describes a study employing a previously collected articulatory database of nearly a thousand spoken productions of words differing only in their stress pattern. This database was collected using a magnetometer system to track articulator movement during speech and includes data on tongue, lip and jaw trajectories for three subjects. The kinematic characteristics of these movements will be analyzed as a function of stress condition and phrasal prominence. The proposed study will profile the manner in which articulatory behavior is shaped by the linguistic dimension of word stress.

Back to Top


An articulatory examination of word-final flapping at phrase edges and interiors
Dani Byrd

Teruhiko Fukaya (now at
Sugiyama Jogakuen University)
Funding: NIH

Formulations of flapping as a symbolic phonological rule suggest clear articulatory differences between flaps and stops, and often offer no overt explanation for why phrase boundaries should block the alternation.  The present study explores the articulatory foundation of the distinction between flaps and non-flaps in word-final position.  We examine kinematic and acoustic data for these articulations in phrase-final and -medial positions and in falling and level stress contours. It is shown that a discrepancy exists between acoustic and articulatory durational patterning while acoustic durations of flaps are shorter than those of non-flaps overall, their articulatory durations are not uniformly so.  It is important to consider multiple potential articulatory sources both spatial and temporal for the acoustic shortness that characterizes flaps, including spatial reduction, temporal articulatory shortening, and changes in intergestural coordination.  The kinematic data indicate that different sources of flap shortness exist for different speakers and different prosodic conditions.  These results imply that flaps are not categorically different from non-flaps in the articulatory domain, as traditional formulations of flapping as a symbolic phonological rule would suggest.  We conclude that the gradient variability in the spatiotemporal patterning of tongue tip constrictions yields acoustic shortening in flaps.
 
T. Fukaya & D. Byrd (2005) An articulatory examination of word-final flapping at phrase edges and interiors. Journal of the International Phonetics Association 35, 1. [pdf]

Back to Top


Coarticulatory information in natural speech stimuli is crucial for infant recognition of syllable sequences
Suzanne Curtin (USC Linguistics; now at U of Pitts.), Toben Mintz (USC Psychology), Dani Byrd (USC Linguistics)

Funding: NIH, SSHC, USC Zumberge Grant

This research was conducted in the USC Language Development Lab (T. Mintz, Director).

Recent research has determined that coarticulatory information in speech provides important cues to early word segmentation. This experiment investigates whether 7-month old infants' ability to recognize a string requires the presence of appropriate coarticulatory information in the speech familiarization stream. Following familiarization to a string of CV syllables, infants were tested to determine if sequences that co-occurred in the familiarization string were preferred over those in which the syllables did not appear adjacently during familiarization. Further, the test phase was conducted so that the items had either appropriate or inappropriate coarticulation information. The results indicate that infants tested on items with appropriate coarticulation listened significantly longer to strings that had appeared during familiarization than to the appropriately coarticulated control strings that never occurred together during familiarization. Interestingly, when presented with inappropriate coarticulation test items, infants showed no preference for previously familiarized strings over the non-co-occurring syllable strings. We conclude that infants are sensitive to coarticulation in recognizing sequences in a speech stream. Furthermore, coarticulatory cues, in combination with other cues to segmentation, greatly enhance recognition of syllable sequences. These results suggest that coarticulation plays an important role in early word segmentation.

S. Curtin, T. H. Mintz, & D. Byrd. (2001) Coarticulatory cues enhance infants' recognition of syllable sequences in speech. BUCLD 25: Proceedings of the 25th Annual Boston University Conference on Language Development. [pdf]

Back to Top


Production ease as a basis for phonological coocccurence restrictions
Rachel Walker

Narineh Hacopian (now at Inquira)

Undergraduate Students: Gayane Arabyan, Mariko Taki

Funding: James H. Zumberge Individual Research Grant Proposal Category I: New Faculty Start-Up

A widespread pattern is the exclusion of similar sound elements in a word is known as a phonological cooccurrence restriction. When word formation produces a structure containing sounds that violate a cooccurrence restriction, the phonological form of the word is altered in one of two ways in order to obey the condition: the prohibited sounds either dissimilate (become less similar) or assimilate (become identical). We are investigating a hypothesis that bans on similar but different elements have a foundation in psycholinguistic processing. By conducting speech error experiments, we will investigate the factors contributing to production ease in words containing consonants that differ in voicing/nasality. The results will have important implications for shaping a crosslinguistic typology of voicing/nasality cooccurrence effects. Explaining phonological cooccurrence restrictions in terms of maximizing production and processing ease is an exciting new interdisciplinary research direction combining linguistics and psychology. This research promises not only to bring new understanding to the study of widespread cooccurrence patterns, but also to mark a significant advance in determining what universal factors underlie properties of human language.

Nasal consonant speech errors: Implications for "similarity" and nasal harmony at a distance
Rachel Walker, Narineh Hacopian, and Mariko Taki ; J. Acoust. Soc. Am. 112, 2359 (2002)

Back to Top

D. Byrd
Copyright © 2006 All rights reserved.
Information in this document is subject to change without notice.


Copyright Notice: The documents distributed here have been provided as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.