Prosody Tutorial Video Series

This is a university-level introduction to Prosody, covering both fundamentals and applications, totalling 4 hours of videos across 29 lectures. This is a updated version of a tutorial presented at the 2021 Meeting of the Association for Computational Linguistics, Prosody: Models, Methods, and Applications.

Nigel G. Ward, Gina-Anne Levow

Bibliography (pdf)

Introduction to this Video Series (7:46)


This is Video 1 of our Prosody Tutorial Video Series. Speech Prosody is the interdisciplinary study of how pitch and intonation, intensity, timing, and other voice properties are used in communication. In this introductory video, Professors Nigel G. Ward and Gina-Anne Levow explain how recent advances in the scientific understanding of prosody make it more relevant, not only to linguists and speech engineers, but also to anyone wanting to understand and communicate better in spoken dialog: students, teachers, clinicians, and others.

Watch     Download Powerpoint

Topics Include:   • Human Speech beyond Words  • Core Prosody: The Musical Aspects of Speech  • Other Prosodic Properties  • Overview of the Tutorial Series  • Relevance for Students, Teachers, Engineers, Scientists, Clinicians ...  • About the Instructors

Why Prosody Matters (3:50)


This is Video 2 of our series on prosody, explaining why prosody matters, for example, for marking speech as sincere --- as when truly giving praise or thanks --- and in order understand where people are coming from. While prosody may come effortlessly to some great communicators, many people want to become more effective, especially in dialog, and knowledge of prosody can help. There are also good business reasons to what to understand how prosody works.

Watch     Download Powerpoint

Topics Include:   • Sincerity and Positive Assessment  • The Power of Prosody, for You, your Students, and your Technical Creations  • Call Centers: Prosody and the Human Touch  • Why this Tutorial: Recent Advances and Ongoing Challenges

Applications Overview (4:29)


This is Video 3 of our series on prosody, introducing its roles in speech technology. We overview a dozen applications where prosody may be useful, and explain why, even in today's Era of Deep Learning, engineers need to know how prosody works.

Watch     Download Powerpoint

Topics Include:   • Speech To Text, Text To Speech  • Recognizing Individuals and their Mental States   • Dialog Systems, Tutoring, etc.   • Big Data and Deep Learning Approaches to Prosody  • The Continuing need for Knowledge of Prosody

Pitch Production (6:37)


This is Video 4 of our series on prosody, kicking off six lectures on prosody production and perception. We start here with pitch, the best-understood prosodic feature, explaining the anatomical and aerodynamic mechanisms.

Watch     Download Powerpoint

Topics Include:   • Airflow and the Larynx  • States of the Glottis: Breathing and Speaking  • Vocal Fold Vibration

See Also:
            Video: Vibration of the Vocal Folds

Pitch and F0 (9:20)


This is Video 5 of our series on prosody. We explain how the F0, computed directly from the waveform, corresponds well to both the glottal cycle and to human perceptions of pitch. Nevertheless, microprosodic effects make the interpretation of F0 contours tricky. Ultimately, pitch, like every prosodic property, has articulatory, perceptual, and acoustic aspects, which don't perfectly coincide.

Watch     Download Powerpoint

Topics Include:   • Waveforms and Pitch Periods  • The Fundamental Frequency (F0)  • Pitch Perception can be Hard   • F0 Contours and How to Read Them  • Divergences between Pitch Percepts and Computed F0  • Microprosody  • The Articulatory-Perceptual-Acoustic Troika

Loudness, Timing, and More (6:16)


This is Video 6 of our series on prosody. Continuing with the low-level features, we discuss articulatory and perceptual aspects of volume, rate, timing, reduction and nasalization.

Watch     Download Powerpoint

Topics Include:   • Loudness  • Rate and Timing  • Reduction  • Nasalization

Phonation Types (13:32)


This is Video 7 of our series on prosody. We wrap up our inventory of low-level features by considering the main voice quality properties: creaky voice, breathy voice, harmonic voice, and falsetto. We discuss how they are produced, illustrate how they sound, and see they look when visualized with tools like Praat.

Watch     Download Powerpoint

Topics Include:   • Introduction   • Creaky Voice, aka Vocal Fry  • Breathy Voice  • Modal Voice  • Highly Harmonic Voice  • Falsetto  • The Periodicity Dimension  • Summary: Prosody is more than Intonation

Exercises (6:45)


This is Video 8 of our series on prosody: a digression for some skill-building. We'll practice perceiving prosodic properties, precisely varying the prosody of your own productions, and imitating some speech samples from an unfamiliar language. This will reinforce earlier concepts and terminology, and lead in to our discussion of perception.

Watch     Download Powerpoint

Topics Include:   • Perception Exercises   • Production Exercises   • Imitation Exercise

Prosody Perception (10:04)


This is Lecture 9 of our series on Prosody, on how the ear and brain extract prosodic information from sound waves. While some sounds have a single clear pitch value, for speech perceptions of pitch are complex. While our ears and brain are powerful recognition engines, they are ill-suited for some kinds of listening.

Watch     Download Powerpoint

Topics Include:   • Psychoacoustics  • Loudness and Intensity  • Pitch Perception: Ear and Brain  • Pitch Scales in Music and Speech  • An Example that can be Perceived as High or Low  • An Example that can be Perceived as Rising or Falling  • Pitch Perception Myths  • The Elusiveness of Percepts  • Limitations of Human Audio Perception  • Ways to Compensate

Tone and Stress (14:03)


This is Video 10 of our series on prosody, explaining how languages mark word identity using tone and stress, using examples from Mandarin, Iau, English, Spanish, and Japanese. We discuss how pitch height, pitch contour, intensity, duration and voice quality factors combine to realize these distinctions.

Watch     Download Powerpoint

Topics Include:   • Introduction  • Prosodic Features in Phonemic Contrasts  • The Four Tones of Mandarin  • Tone Realization and Tone Notations  • Tone Serving to Convey Grammatical Functions  • Stress and its Realization  • Stress Differences Across Languages  • Stress Patterns and Rhythm

Sequencing and Connecting (9:48)


This is Video 11 of our series on prosody. Proper prosody for word sequences involves much more than just joining the prosody of each word in turn, or tone in turn. There are both universal smoothing processes and language-specific adjustment processes. Prosody can also indicate the semantic relations between adjacent words.

Watch     Download Powerpoint

Topics Include:   • Sequences of Units  • Co-articulation   • Tone Sequences and Tone Sandhi  • Stress Shift  • Semantic Relations  • Summary

Prosodic Structures (4:47)


This is Video 12 of our series on prosody, on the role of prosody in marking syntactic structure and helping with disambiguation, and on prosodic structures themselves. We also discuss prosodic phrasing and prominence, and how these can differ across languages.

Watch     Download Powerpoint

Topics Include:   • Introduction  • Prosodic Words  • Disambiguation through Prosody  • Phrasing and Boundaries  • Prominence

Visualizations and Representations (7:54)


This is Video 13 of our series in prosody. We overview seven ways to visualize and represent prosodic information, ranging from the very direct to the very abstract.

Watch     Download Powerpoint

Topics Include:   • Why Visualize?  • Pitch Contours  • Smoothing  • Pitch Levels   • Pitch Turning Points  • Two Strategies: Tidying and Extracting  • Tadpole Notation  • ToBI Notation  • Cautions Regarding Visualizations

Introduction to Features (6:00)


This is Video 14 of our series on prosody, starting four lectures on prosodic features. To support both research and development, we need automatically-computable features for the various prosodic properties Feature computation is typically done in three or four stages. Computed features never perfectly reflect perceptions, so observers and researchers need to use caution in interpreting them.

Watch     Download Powerpoint

Topics Include:   • Features are Measurements  • Low-Level, Mid-Level, and Meaningful Features  • Technical Pitfalls  • Computed Features at best Correlate with Perceptions  • The Irreducible Subjectivity of Perceptions

Using Pitch Trackers (8:10)


This is Video 15 of our series on prosody. Pitch trackers output their best estimate of the F0 value every 10 milliseconds or so, but to get meaningful results, you generally need to set their parameters carefully.

Watch     Download Powerpoint

Topics Include:   • Ideal and Real Pitch Contours  • Pitch as a Frame-Level (Low-Level) Feature  • Popular Pitch Trackers  • Stages of Pitch Inference  • Setting the Pitch Range  • Microprosody  • Smoothing   • Setting the Voicing Threshold  • Output Scales: Linear, Logarithmic, and Percentile  • Pitch, No Pitch, and Voicing Probability Estimates

Normalization (10:52)


This is Video 16 of our series on prosody. Two instances of essentially the same prosodic behavior may yield very different measurements, due to differences in speakers, recording conditions, etc. Use of appropriate normalization techniques can enable better comparisons and aggregations over diverse data.

Watch     Download Powerpoint

Topics Include:   • Compensating for Speaker Differences   • Pitch Value Distributions  • Subtracting Minimums  • Subtracting Averages  • Standard Deviations and Z-Normalization  • Percentiles  • Normalizing in General  • Normalizing Intensity  • General Cautions

MidLevel Prosodic Features (9:47)


This is Video 17 of our series on prosody. Above the low-level features we've discussed so far, midlevel features represent the perceptually important aspects of prosody. We compute these by aggregating values across timespans, or combining different types of prosodic features. Key midlevel features include speaking rate, pitch height, pitch range, pitch slope, and pitch peak location. Midlevel features are included in many open-source feature sets, including Praat, OpenSmile, and the Midlevel Toolkit.

Watch     Download Powerpoint

Topics Include:   • Midlevel Features are Computed by Aggregation  • Unit-Aligned Features  • Unaligned Features  • Multistream Features, Illustrated with Relative Peak Locations  • Robustness Issues, Illustrated with Pitch Height  • Pitch Range Issues  • Potentially Thousands of Midlevel Features  • Open-Source Feature Sets and Toolkits   • Some Less-Known Features

Speech Recognition (9:59)


This is Video 19 of our series on prosody. Since prosody can mark word identity, through tone and stress patterns, it can be used to help compute the word sequence presence in a speech signal. However there are complications which make this difficult in practice.

Watch     Download Powerpoint

Topics Include:   • Speech Recognition and Prosody  • The Concept of an Independent Prosody Module   • Unit-Linked Prosody is Less Independent than it Once Seemed  • Modeling Prosodic Effects on Sound-Phoneme Mappings  • Summary of Lessons Learned  • Speech Recognition Today, and Unmet Needs

Feature Sets and Machine Learning (8:19)


This is Video 19 of our series on prosody. Systems for many tasks, such as like speech recognition, emotion recognition, and intent recognition, are built today by applying machine learning methods to large datasets. The input to these models is usually some set of features computed from the audio signal. In this lecture we discuss the advantages and disadvantages of various types of prosodic features for various purposes.

Watch     Download Powerpoint

Topics Include:   • Machine Learning needs Features  • Different Types of Features  • Using Meaningful Features  • Using Midlevel Features  • Using Frame-Level (Low-Level) Features  • Using Filterbank and other Generic Features  • Using Features from Pretrained Models  • Feature Set Choices for Common Tasks  • Summary

Paralinguistic Prosody (8:24)


This is Lecture 20 of our series in prosody. Earlier we discussed the linguistic functions of prosody, but prosody also has an "untamed," paralinguistic side. Factors such as speakers' physical and mental states can directly affect their prosody, and this has applications in medical diagnosis and speaker identification.

Watch     Download Powerpoint

Topics Include:   • Prosody as a Direct Reflection of Mental States  • Indicators of Emotions  • Speech Production Mechanisms  • Markers of Depression   • Speaker Identification  • Other States and Traits  • Methods for Paralinguistic Inference   • Differences between Phonological and Paralinguistic Prosody

Prosody in Pragmatics (8:48)


This is Lecture 21 of our series on prosody, introducing the idea that pragmatic functions are often conveyed by prosodic constructions, that is, temporal configurations of multiple prosodic features. Examples include the prosodic constructions for expressing positive feeling, for making polite suggestions and for greeting.

Watch     Download Powerpoint

Topics Include:   • Conveying Pragmatic Functions as a Third Realm of Prosody  • The Prosody of Positive Assessment  • Correlations and Beyond  • A Prosodic Configuration: The Positive Assessment Construction  • The Late Peak Construction: Implying, Questioning, and Being Polite  • The Minor Third Construction: Cueing Action  • Summary: Multistream Temporal Configurations

See Also:
            Book: Prosodic Patterns in English Conversation
            Website: Audio Examples etc. for the above book
            Video: The Minor-Third Pattern

Prosodic Constructions and their Properties (9:26)


This is Lecture 22 of our series on prosody, continuing the topic of how prosody conveys pragmatic meanings. In general, these meanings are gradient, not categorical; the alignments of prosodic constructions with words can be flexible; constructions are direct form-function mappings, and some constructions can be joint projects, involving both speakers, for example when used to manage turn-taking. Finally, prosodic constructions can be superimposed, in both form and meaning.

Watch     Download Powerpoint

Topics Include:   • Prosodic Constructions for Pragmatic Functions  • Their Gradient Nature  • Their Flexibility of Alignment  • Their Nature as Direct Form-Function Mappings  • A Continuum from General to Specific  • Direct versus Mediated (Symbolic) Mappings  • Joint Constructions, such as Backchanneling  • Superposition  • The Many Pragmatic Functions of Prosody

The Three Realms, Revisited (9:17)


This is Lecture 23 of our series on prosody, in which we revisit the three realms of prosody, paralinguistic, phonological, and pragmatic, their connections, and their differences. We speculate about the evolution of prosodic forms, discuss how entanglements among the realms can cause misunderstandings, and discuss the issue of the independence of prosody relative to the words said.

Watch     Download Powerpoint

Topics Include:   • The Frequency Code  • Cross-Realm Entanglements with Creaky Voice  • How Prosody Can do so Much  • Differences Among the Realms  • On the Independence of Prosody   • Sarcasm through Prosodic-Lexical Meaning Mismatches

Speech Synthesis (9:55)


This is Lecture 24 of our series in prosody. For synthesized speech, proper prosody is not just nice-to-have, but essential for good intelligibility. We survey 40 years of work, including recent deep learning approaches whose output can be better than human speech for some purposes. Nevertheless, more work is needed to attain controllability, pragmatic expressiveness, and effectiveness for dialog applications.

Watch     Download Powerpoint

Topics Include:   • Prosody for Intelligibility  • Rule-Based Synthesizers   • Statistical Synthesis with Hidden Markov Models  • End-to-End Speech Synthesizers  • Machine Learning Process Overview   • Loss Functions and Synthesis Quality  • Variant Architectures  • Limitations of the State of the Art  • Beyond Text-to-Speech  • Toward Synthesis for Dialog Applications  • Toward Better Models through Disentanglement

Dialog Systems (12:48)


This is Lecture 25 of our series on prosody. In recent years spoken dialog systems, smart speakers, intelligent agents, and their cousins have seen wide acceptance, but only in certain domains. Going component by component, we explain how better use of prosody may dramatically improve the usability of spoken dialog systems. Yet future systems, with super-human prosodic abilities, pose societal risks.

Watch     Download Powerpoint

Topics Include:   • Towards Truly Interactive Dialog Systems  • Limitations of Current Systems  • Typical Architecture of a Dialog System  • Speech Synthesis and Speech Recognition, revisited  • Turn-Taking and other Reactive Behaviors  • Sensitively Tracking the User's State   • Clearly Conveying the System's State  • Challenges in Scaling Up  • Societal Risks

Individual Differences (10:11)


This is Lecture 26 of our series on prosody, discussing how people vary in their prosodic abilities and behaviors. We survey the skill components involved in the effective use of prosody in dialog, and how these differ among people, focusing on people with autism and those without.

Watch     Download Powerpoint

Topics Include:   • People Vary in Prosodic Behavior and Abilities  • Overview of the Mental Procesess involved in Prosody in Conversation  • Low-level Pitch Perception: Genetic and Neural Factors  • Knowing and Recognizing Prosodic Constructions  • Some Learning Processes and Difficulties   • Interpreting Prosodic Meanings  • Assembling Prosodic Constructions into Plans  • Executing Prosodic Plans with Proper Control   • Turn-Taking  • Summary

See Also:
            Video: Unlocking Prosody: Discovering Structured Variation and Rich Context Effects, Jennifer Cole

Teaching Prosody (8:36)


This is Lecture 27 of our series in prosody, surveying first the general needs of learners, and then some specific needs of people wanting to become better public speakers, to have better relationships, or to become fully effective in a new language. We discuss some especially hard-to-master aspects of prosody, overview some teaching methods, and tell some anecdotes about famous linguists and their discoveries.

Watch     Download Powerpoint

Topics Include:   • Helping People Master Prosody  • What Learners Need  • General Teaching Methods  • Charisma through Prosody in Public Speaking  • Teaching Prosodic Effectiveness in Dialog  • Intelligibility through Prosody for Non-Native Speakers  • Social Dimensions of Prosody and the Potential for Misinterpretation  • Challenges faced by Non-Native Speakers  • Teaching Prosodic Constructions  • Prosody and the Perfection of Human Society

See Also:
            Video: Teaching Tip: Learning Prosody through Age-Inappropriate Play
            Website: Resources for Teaching the Prosody of English as a Second Language

Historical Perspective (5:32)


This is Lecture 28 of our series in prosody. Much of what we've presented in this series may contradict what most people think. Many of those beliefs are myths, hallowed by repetition over the decades and centuries. So here we review the concerns and biases of earlier scholars, and consider connections with music notation, renaissance social climbing, punctuation and alphabets, and the golden age of broadcast.

Watch     Download Powerpoint

Topics Include:   • The Persistence of Prosody Myths  • Poetry and Music  • The Noble Speakers, Social Climbing, and Prescriptive Rules  • Why People Believe that Questions have a Final Pitch Rise  • Broadcast Speech as an Outlier Artform  • Summary: Archaic Perspectives Still Linger

Prospects and Challenges (3:11)


This is the 29th and final lecture of our series on prosody. In future, improved awareness and understanding of prosody will benefit society, but getting there will require progress on many fronts.

Watch     Download Powerpoint

Topics Include:   • Improving Human Society through Advances in Prosody  • Hopes for Technology  • Growing Knowledge, both Applied and Fundamental   • Maintaining Synergy across the Field  • Infrastructure Needs  • Summary of our Hopes for the Field  • Bibliography