Voice is stored in the vocal folds

February 11, 2022February 10, 2022 Jesse Weirblog, English, language, Linguistics, PhoneticsLeave a comment

I have talked a lot about voiced sounds in previous posts, but I have been ignoring one cool thing that you can do to trick your brain into hearing something unexpected. Before we can get to that though, I need to teach you about voice onset time.

Voice onset time (or VOT) is a phonetic measurement of how long it takes for voicing to start after a stop is released. When you are articulating a stop, you are using either your lips (for [p] and [b]) or your tongue (for all other stops like [t] [d] [k] and [g]) to completely cut off the flow of air momentarily in speech (hence the name stop). The stoppage of airflow in a stop occurs at the beginning of the sound which leads to an air pressure buildup behind the lips or tongue that is audibly released when the sound is produced (this is called the release burst). If you have a word with a stop at the beginning of it like “dog”, we can measure the amount of time between when the stop is released (when air starts flowing again) and when the vocal folds start vibrating again. This does require using some audio analysis software (Praat) to see clearly, but here is what it looks when I say “dog”.

You can see on the left side of this wave form where there is no sound. This is because my tongue is placed against my alveolar ridge and there is no air or sound coming out yet. The dark black line at the left edge of the highlighted region (in pink) is the point where my tongue releases from that position and the air begins to flow out producing sound. This is not when the vocal folds start vibrating though. That point comes approximately 15 milliseconds later (shown at the right side of the highlighted area).

This 15 millisecond VOT is slightly higher than average. The average VOT for voiced stops like “d” in English is anywhere from 0-10 milliseconds.

Now take a look at this recording of me saying the word “tag”.

Visually, you can see that the VOT for “tag” is much larger coming in at about 103 milliseconds. Again, this is higher than the expected average of about 30-40 milliseconds, but we can chalk this up to the productions being recorded in isolation with careful and purposeful speech.

If I record the two words together in a spoken sentence like “the soldier is wearing dog tags” it becomes a little closer to the expected averages as seen in this third image here.

What is making the VOT so much larger for a “t” compared to a “d”? The “t” sound in English is a voiceless stop meaning that the consonant itself is articulated with the vocal folds spread so they do not vibrate. So, when we are measuring the VOT of a voiceless sound, we are measuring the time from when the stop is released to the beginning of the voicing from the adjacent vowel (all vowels are voiced). Contrast this with a “d” sound which is voiced stop meaning that the vocal folds are pressed together so that they vibrate during the actual consonant sound. With a voiced stop, we are measuring the time from when the stop is released until the vocal folds begin vibrating because of the consonant itself. We always expect voiceless stops to have a larger VOT than voiced stops for this reason and this is universal for all languages.

So is that all there is to VOT in English? It turns out that we have two types of voiceless stops in English. The one that you get will depend on the surrounding environment. Let’s do a little demo and you will see why I mean.

Place your hand in front of your mouth and say the word “stop”, and then, with your hand still in place, say the word “top” (Pandemic note: masks do interfere with this demonstration and will need to be removed for full effect). You will feel that when you say the word “top”, there is a significant puff of air that hits your hand compared to when you say the word “stop”. The burst of air is called aspiration, and in English, voiceless stops that appear at the beginning of a stressed syllable will have aspiration if they are the first sound. When we transcribe these aspirated stops in the International Phonetic Alphabet, we will use a superscript “h” to denote this aspiration [t^h].

In a word like “stop”, we have a voiceless unaspirated stop and these will have a VOT that is shorter than the aspirated version, but still longer than voiced stops like a “d” sound. Taking a look at one final recording of mine, when I say the word “stop” my VOT comes out at about 23 milliseconds.

For English speakers, we don’t care about this distinction between aspirated and unaspirated voiceless stops. Both of these are just considered “t” sounds in our language. This is evidenced by the fact that you probably didn’t know about this difference. More importantly, if you say the word stop, but you put a lot of effort into really making sure that you get as much aspiration as possible on the “t”, it is still just the word “stop”. Nothing will change about the meaning of it.

This is not the case for all languages. Let’s take Armenian for instance. Armenian has a three way stop contrast where using a voiced stop, a voiceless unaspirated stop, or a voiceless aspirated stop in a word can change the meaning of it. An example from a 2003 paper states that the word transcribed as [baɹi] (the upside-down r is just a regular “r” sound) means ‘good’, the word [paɹi] means ‘dance’ while the word [p^haɹi] refers to the first fruit that a tree bears. (Hacopian, N. (2003). A three-way VOT contrast in final position: data from Armenian. Journal of the International Phonetic Association, 33(1), 51–80. http://www.jstor.org/stable/44526902)

I think this is a really cool distinction that shows just how important language really is. Something as simple as how much aspiration you use to say a word can have a huge impact on the meaning of it. There are even some languages that will go the extra mile on their voiced stops and have what is referred to as pre-voicing. Pre-voicing means that the vocal folds will start vibrating before the stop articulation is released meaning that the VOT of that sound will end up being negative. This is a phenomenon that is observed in some Southern African languages such as Taa and !Kung.

And now for one last cool thing I can show you before I close out. Check out this video of a quick auditory illusion.

This is again, me saying the word “stop”. But if you notice from the second time it is played, when the “s” part of it is cut off, it sounds a little bit like a mixture between “top” and “dop”. This confusion you may be experiencing comes from the fact that a voiceless unaspirated stop is closer in VOT to a voiced stop than it is to a voiceless aspirated stop that we would expect at the beginning of the word “top”. Our brain wants to hear the word “top” and will ultimately recognize it as such, but there is this brief moment of ‘wot in tarnation’ that our brains go through first because we are pretty sure that might be a ”d”, even though “dop” is not a word.

Anyway, I am going long again, like usual. Thank you so much for sticking with this long-winded post. I hope this was informative and interesting to you. Be sure to come back next week for more interesting linguistic insights. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.

New York’s Hottest Club

January 28, 2022January 27, 2022 Jesse Weirblog, language, Linguistics, PhoneticsLeave a comment

Toward the end of 2021, I had the opportunity to work on a super fun personal project that I have been excited to talk about. I was able to closely examine the speech of Bill Hader over a period of years and directly compare that speech to one of his most famous recurring characters on Saturday Night Live, Stefon Meyers. For those of you that are not familiar with Stefon, check out this video of one of his early appearances on the SNL segment Weekend Update.

If you compare Stefon’s speech to a Bill Hader interview, you can hear there is quite a difference.

First let’s get a little background on Bill and Stefon. Bill Hader is a comedic actor and voice actor originally from Tulsa, Oklahoma. Bill got his first big break in show business when he was hired on as an actor for Saturday Night Live in 2005. While working at SNL, he also worked on many movies such as Cloudy with a Chance of Meatballs, Superbad and Forgetting Sarah Marshall.

During his time at SNL he also worked to develop the character of Stefon with John Mulaney (a former SNL writer, now stand-up comic). The character of Stefon was written after John Mulaney received an email from a club promoter who was trying to entice him into coming out to a hot new night club. The email had ridiculous selling features for the club such as “a room with broken glass”. The character of Stefon was meant to be a heightened version of this that served as a correspondent for Weekend Update who would let the host (Seth Meyers) know about all of the “hottest clubs” in New York City that tourists need to check out.

The voice for Stefon was inspired by a barista at a local coffee shop that Bill would frequently visit. The voice given to Stefon is higher in pitch and is breathier compared to Bill’s speaking voice. The voice also has a very prominent and stereotypical gay lisp to it which is most noticeable on his repeated affirmations “yes yes yes yes yes yes yessssssss”.

All of this is good and fun, but why is this worth looking at and talking about? Well, Bill was on SNL from 2005-2014, and during that time he performed the character of Stefon a total of 14 times (from 2010 to 2014) and an additional two times after leaving the show. This means that, thanks to the power of the internet, we have access to longitudinal data for both Bill (who did several late night interviews during this time) and Stefon. This allows us to compare three things:

1. We can see how consistent Bill was in his performance of Stefon over a period of years

2. We can see whether Bill’s own speaking voice changes over the years

3. If Bill’s voice does change, do those changes affect Stefon, or does he just keep Stefon in this somewhat frozen state?

So, what factors can we look at to compare these two voices? The first factor we can talk about is their pitch. If you recall a few weeks ago, I wrote a quick summary on how we can determine a speakers pitch and the factors that can influence it. The biggest thing to keep in mind here is that the pitch of both Bill and Stefon are coming from the same vocal tract. There is no obvious physical difference we can point to between the two voices like height or gender so any changes in pitch are the result of intentional effort by Bill.

As I already pointed out, you can tell that the pitch of Stefon’s voice is higher than Bill’s own voice simply by listening to his earlier performances. Bill has an average pitch of about 111 Hz while Stefon’s end up being around 133 Hz. But a key observation here is that because we have access to years of recordings, we can look at his performances year by year and spot a trend that might go unnoticed when looking at all the data at once.

Stefon (2018)

Bill Hader (2018)

In this later performance from 2018 (after Bill had not performed the character for 4 years), the pitch of Stefon actually ends up being lower than Bill’s speaking voice! (122 Hz for Stefon compared to 139 Hz for Bill). What is going on to cause this change? Well, I can’t say for certain, but my theory is that the long break between performances likely played a role in this. So, the interesting pattern over time here is that while Bill’s pitch is slightly increasing over time, Stefon’s pitch is going in a downward trend and there is this definitive crossover in the later years of performances.

What else can we talk about in these performances? I mentioned earlier that there is a prominent lisp that Bill gives to Stefon that really draws out his “s” sounds, but how can we quantify this difference. We can do this using simple airflow physics and the concept of centre of gravity. The physics is not super complicated, I promise. What we are measuring here is the energy created by the air flowing through the small space you make in your mouth you make when producing an “s” sound.

You can hear what I mean by doing this quick experiment yourself: Try to say the word “sa” out loud. Now say it again, but this time try to make a big smile and put your tongue as close to the roof of your mouth as you can while still being able to say the word. When you said it the second time, it probably sounded a bit louder on the “s” sounds. If you had recorded these productions and looked at the spectrograms for them, you would be able to see that the second version had higher frequency sounds on it compared to the first one. Something like this:

Comparing the production of the word on the left (save) to the one on the right (shave), you can see how the one on the right is much darker at the beginning. This is because the “esh” sound has a lower centre of gravity and less energy is produced overall in that sound compared to an “s” sound.

We can quantify this difference by looking at the centre of gravity of each of these sounds, and from that we can infer differences in how they are articulated. Let’s go ahead and look at how the “s” sounds differ for Bill and Stefon.

Unsurprisingly, the centre of gravity for Stefon (7877 Hz) is a lot higher on the “s” sounds compared to Bill (6051 Hz), but again looking at the change over the years reveals a surprising trend. In 2018, Bill had a centre of gravity of 8244 Hz and Stefon had a centre of gravity of 8373 Hz. This year is much closer than any other year where recordings exist for both Bill and Stefon. Looking at each speaker over time, we see that Bill’s centre of gravity steadily increases over the years while Stefon remains relatively the same.

Comparing the centre of gravity for Bill (blue) and Stefon (orange) over the years

Remember for these sounds too, this is the same “s” sound produced both in character, and out of character. I had to use two completely different sounds to show you a similar scale of difference, but Bill Hader was able to do this using the same speech sound.

The fact that he was able to remain so consistent in how he was articulating sounds as Stefon while his own voice has so much variation in it is quite amazing. This is a true testament to Bill’s talent. Even more so when you remember that he had a long period between 2014 and 2018 where he did not perform as Stefon at all. For him to be able to come back after that long break when his own voice so obviously changed is very cool (to me at least).

So, this is basically just a lot of charts and numbers at the end of the day, but I hope that it was at least cool to see some of the practical applications of acoustic phonetics. While this is a fun and silly example, there are practical applications for these techniques. A great example of that could be legal cases where investigators are trying to match an unknown speaker recording to a suspect. I had a lot of fun doing the research for this and doing the complete writeup for this for course credit and these are the two most interesting findings that came out of this. I likely won’t return to the rest of the findings because it gets a little more technical, but if there is interest in it, I am always happy to reconsider that.

Thank you for reading folks! I hope this was informative and interesting to you. Be sure to come back next week for more interesting linguistic insights. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.

Things are getting tense

January 14, 2022January 13, 2022 Jesse Weirblog, language, Linguistics, Phonetics1 Comment

Pitch is a word that gets thrown around quite a bit. Some people have high pitched voices and some people have low pitched voices. Have you ever thought about what pitch means though? Sure, we know when someone’s voice sounds high pitched, but what is the reason for that high pitch? You can probably sort the people in your life into “those that have high pitched voices” and “those that have low pitched voices”. This sorting that you are doing in your head may not correspond to typical concepts of gender either. I would wager that you can think of some female friends that have low pitched voices and some male friends that have high pitched voices. There may be another pattern that you notice in these people though. I bet that most of the people in your life with low pitched are taller on average and those with high pitched voices are typically shorter. It turns out there are many factors we can look at that correlate with vocal pitch.

But really, what exactly is pitch? Like I said, we have this idea of what pitch is from just listening to something, but what actually makes something high pitch? Well, those of you with musical training are likely already a few steps ahead of me on this one. The pitch of one’s speech, similar to a musical note, is measured in Hertz (Hz) and corresponds to how ‘quickly’ something is vibrating. But what is doing the vibrating? In speech, it is our vocal folds that are vibrating when we produce vowels and voiced consonant sounds.

Photo by Andrea Piacquadio on Pexels.com

I have talked about the vocal folds before, but I will do a quick refresher here. Your vocal folds are the folds of tissue located in your throat that are responsible for phonation. During the production of a vowel or a voiced consonant, these vocal folds will press together and the air that passes through them at this time will cause them to vibrate producing the sounds that we hear.

Let’s compare these vocal folds to the string of a guitar. In a standard tuning of a six-string guitar, the sixth string will be tuned to E with a frequency of 82.407 Hz. What this means is that when you pluck this string, it will vibrate in a cyclical fashion and should repeat that cycle of vibration approximately 82 times every second. If you were to adjust the tuning peg and tighten the string, the note that you hear would increase in pitch and the frequency of the note would also increase meaning that the string vibrates more times per second. Conversely, if you loosen the string, the note sounds lower and the frequency of the note would also decrease.

Our vocal folds function the same way as a guitar string. The tighter you hold them together and the faster they vibrate, the higher the pitch that you produce. This is how we are able to produce different pitches in our voices as we sing and speak. But wait, I also mentioned that there was a likely correlation of the pitch of one’s voice and their height, right? Let’s go back to the guitar for a second.

Again, in standard tuning, the sixth string is tuned to E (82.407 Hz). If you place your finger on the fifth fret of the guitar and pluck the string, the note that will come out will be an A with a frequency of 110 Hz. All you are doing by placing your finger on this fret is making the string shorter by a set length to raise the pitch, so from this we can infer that in addition to tension, the length of the string also plays a factor in the pitch.

When it comes to vocal folds, it stands to reason that people who are taller also have larger proportions in most area’s of their body. I mean, you know what they say about people with big feet right… That’s right! They do say that they have longer vocal folds! Now this is not true of every tall person in the world (there are exceptions to almost everything).

Alright so now that we have a better understand of how we quantify someone’s pitch, how can we measure it? Well, thanks to modern technology, we can have software do it for us. A piece of software widely used in the world of linguistics called Praat is used to analyze many aspects of recorded speech including pitch. In the image below, you can see a spectrogram of a recording of me saying the word “fantastic”. On this spectrogram (the bottom half of the image), the blue line represents the pitch tracking that the computer calculates and the average for my pitch ends up being approximately 131 Hz (which is slightly above average for a man my age).

You will also see that there are gaps in the blue line and there is an explanation for them. Pitch can only be tracked and calculated on voiced consonants and vowels, but many of the consonants in “fantastic” are voiceless meaning that the glottis is spread open when they are produced, and the vocal folds are not vibrating. So, we know that the computer can tell us this number, but how is this number calculated? Let’s zoom in close on one of the vowels in this recording to get a better idea of what is going on.

If you look at the waveform here (in the top half of the image), you can see that even though there is a lot of variation in the line, there are patterns that are repeating in it. I have highlighted one of these chunks and you can see that it takes approximately 0.00713 seconds for one of these cycles. To convert this into pitch, we need to figure out how many of these repetitions can happen in one second (that’s what Hz stands for after all!). So if we do some cross multiplication and division like we are in high school, it turns out that this works out to 140.25 Hz, which is very close to what the computer is calculating for this particular vowel (140.4 Hz). Keep in mind that the computer is looking at the entirety of the word while we are just doing the math based on a single cycle. The computer has a complex algorithm that it uses which takes into account several cycles and the surrounding environment, but this is just a quick showcase of how it works.

And before I go on for an eternity, I think we can stop here and call this a solid primer to the mechanics of pitch. I hope that it was informative though and I have enjoyed getting back into the habit of writing like this again. I still have so much to share and I hope that you will come back to learn more. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.

Speech perception

October 29, 2021October 28, 2021 Jesse Weirblog, Illusion, language, Linguistics, Phonetics, Speech Perception1 Comment

A big focus on this blog has been centered around understanding and producing speech, but something that I have ignored up until this point is how speech is perceived. Speech perception is focused on hearing, decoding and interpreting speech. As we will see today, our brains are often not as reliable as we might think.

So rather than just turn this into a lecture about speech perception and the multitude of theories behind it (let’s face it, this is an educational blog, not a university course) I am just going to show off something weird and wild that our brains do and talk a little bit about the mechanics behind it. Alright, so raise your hand if you have heard of the McGurk effect. (Oh wait, sorry. Blog, not lecture)

The McGurk effect is an auditory illusion where certain speech sounds are miscategorized and misheard based on a conflict in what we are hearing versus what we are seeing. We can see this in action by watching the short video below.

So what is actually going on here? The audio that is being played in all three of those clips is exactly the same. You are hearing the same speaker say “ba ba” over and over. But when the audio is played over a video of someone mouthing “da da” or “va va” we are able to hear it as those instead.

Well as it turns out, this illusion provides positive evidence for something called the motor theory of perception. This theory argues that people perceive speech by identifying how the sounds were articulated in the vocal tract as opposed to solely relying on the information that the sound contains.

This motor theory is supported by something like the McGurk effect because we are taking this audio information and supplementing it with what we are visually observing in the video in order to decide what is being said. It also explains why it is easier to hear someone in a crowded or noisy setting if you can look at their mouth and watch them speak as opposed to not being able to see their mouth.

But it’s not as though we are following along with what people are saying by moving our own articulators or imagining how their mouths are moving while we are listening to them. Supporters of the motor theory argue for the process with specialized cells in our brains known as mirror neurons.

A mirror neuron is a specialized neuron in the brain that activates (or fires if you prefer) in two different conditions, it will activate when the individual performs an action and it will also activate when an individual observes another performing the same action. In speech, this would mean the same part of your brain that activates when you move your mouth to produce a “ba” sound will also activate when you watch someone else produce a “ba” sound.

With this knowledge in mind, it should be easier to see why we are able to get something like the McGurk effect to occur. If perception of speech is influenced by visual information, and we are observing someone producing a sound that is activating these mirror neurons, it makes sense that our perceptions might change slightly so that what we are hearing matches what we are seeing.

It is important to note that, as I mentioned earlier, this is not the only theory of speech perception that we have right now, and the motor theory is not without its flaws. It relies on a persons ability to produce the sounds themselves. According to the motor theory, if you were unable to produce the sound yourself, and you could not visually see how the speaker was articulating the sound, you should not be able to perceive it.

So what about prelinguistic infants? An infant who has not developed the ability to speak yet should not be able to perceive the difference between a “ba” and a “da” without visual assistance because acoustically these sounds are quite similar.

Some studies have used a novel methodology where the infant will suck on a specialized soother of sorts that will measure the rate at which they are sucking. Using this soother and presenting the infants with audio stimuli through a speaker (no visual input), they have found that presenting infants with new and novel stimuli causes them to suck faster and presenting them with familiar stimuli means that they will suck at a slower rate.

So, by presenting these infants with a series of “ba ba ba” followed by a sudden change to “da da da” will result in an increased sucking rate. These findings are contradictory to the motor theory of speech perception because the infants in this study are too young to speak on their own and their articulators are not refined enough to be able to produce both a “ba” and “da” sound. Because the infants cannot produce these sounds at this point, their mirror neurons would not activate because they would not have developed fully yet.

This is not to say that the motor theory of perception is wrong though. The fact that we are able to perceive the McGurk effect means that their must be some truth to it. It just calls into question whether this theory captures the whole story. This is something that almost every science deals with at some point. There is almost never a perfect explanation or theory that deals with every problem. If you look hard enough, there will be counter evidence to almost any theory, but it becomes a matter of refining theories as we learn more and more about the way that the world works.

There are many other theories of speech perception that have their own explanations and their own problems. I will likely return to discuss some of the other big ones such as Exemplar theory, but for now I think this is a good place to leave this one.

The anatomy of speech

October 22, 2021October 21, 2021 Jesse Weirblog, English, Linguistics, Phonetics2 Comments

Have you ever thought about how you talk? I don’t just mean the way that you say certain words, or maybe the fact that you slur your words after a few too many drinks. I mean HOW you talk. The anatomy of the mouth and the way that your tongue does such quick and precise movements is truly fascinating. I also want to issue a pre-emptive apology because if you are anything like me, after reading this you will spend way too much time being aware your tongue. But enough of the preamble, let’s just get into it.

If you think about it for too long, tongues are just gross muscular things in our mouths. We use them when we eat to move food around in our mouths and to get food that was trapped between our teeth free, and of course they are primarily responsible for tasting. An often underappreciated function of tongues is their involvement in speech. This is not to say that tongues are essential for all speech, but they play a major part in the formation of both consonant and vowel sounds.

For reference, of the 23 English consonants in the International Phonetic Alphabet, only 7 of them do not directly involve the tongue. But this is just a little taste of what is to come. For now, lets talk about all of the things we need to classify a sound. When it comes to identifying a sound, there are three things we need to consider: voicing, place of articulation, and manner of articulation.

Voicing is not something that involves the tongue at all, but it is something that we have talked about previously. As a reminder, voiced sounds are produced with your vocal folds being held close together so they vibrate when air passes through them. You can feel this in a word like “zit” by placing your fingers on your neck as you say it. Compare this to a word like “sit” which has a voiceless sound at the beginning. Voiceless sounds are produced by keeping your vocal folds spread open so that there is no vibration.

Moving up from the vocal folds, let’s get back to the tongue. We will begin talking about the tongue by discussing the different places of articulation. The places of articulation are mostly self explanatory with names like inter-dental (between the teeth) and bilabial (involving both lips), but the one we will discuss first deals with the “s” and “z” sounds we have discussed previously. These sounds are classified as alveolar sounds, meaning that they are articulated with the tongue at a place in your mouth known as the alveolar ridge. The alveolar ridge is just behind your upper front teeth and if you feel around with your tongue, you can feel a small protuberance where the roof of your mouth raises slightly. The diagram below shows a mid sagittal cross section of an oral cavity which shows the alveolar ridge, and all of the other places of articulation in the mouth.

Not all of these places involve the tongue as we previously discussed. All bilabial sounds like “b”, “p” and “m” are produced with only the lips and the tongue is not involved at all. Sounds like “f” and “v” combine two articulators (the teeth and the lips) to produce sound and these are known as labiodental sounds which, again do not use the tongue.

Before we move onto the manner of articulation, I want to talk about “r” for a second. “r” is a unique sound in English because it can be produced in two different ways depending on how you move your tongue. So now is when I ask you, are you a buncher or a curler?

To figure out whether you are a buncher or a curler, there is a simple test you can do. Go grab a toothpick or something similar that you are comfortable putting in your mouth and just poke your tongue as you are producing an “r” sound. If the thing you are poking is the bottom of your tongue, congratulations, that means you are a curler. If you are poking the top, then also congratulations, you are a buncher.

It turns out that an “r” sound can either be produced by curling your tongue tip back toward the rear of your mouth, or by just bunching up your tongue blade toward the back of the tongue. It is important for speech language pathologists to know about this so they can be prepared to teach techniques for both of them. There is no advantage or disadvantage to either technique, bunchers and curlers can both produce “r” sounds just fine. This is just a weird quirk of our bodies that we can observe.

Now back to the different sounds. Let’s talk about manner of articulation. Manner of articulation deals with the finer aspects of the tongue and how it directly impacts the airflow in the oral cavity. For example, lets return to the “s” and “z” alveolar sounds that we talked about earlier. These sounds are known as fricatives because they are produced by having the tongue very close to the place of articulation, but not touching it so that there is a small amount of space between them that results in a small amount of frication in the airflow, hence the name.

So now think about a sound like “t” or “d”. These sounds are both alveolar sounds as well, but they are produced by having the tongue touching the place of articulation and momentarily stopping the airflow entirely. Unsurprisingly, these are called stops. Now what about a sound like an “n” or an “m”. When you produce these sounds, you are producing them like you would a stop, but you can feel a little bit of reverberation in your sinus as you are producing them. These sounds are nasal stops, and they are produced by lowering the velum at the back of your oral cavity which allows the air to flow into your nasal cavity and resonate like that.

The amazing thing about all these actions is that they are not things that you actively need to think about to do. In fact, you probably put zero thought into how this works until you read this post. Our bodies can do all of this effortlessly and automatically.

As always, this is just a brief overview. We don’t have time to get into all the different places and manners of articulation. We will likely return to talk about more unique language sounds (like clicks), but for now, I think this is a good place to leave it.

Menu

Talking Linguistics

A blog showcasing the weird and wonderful word of language

Phonetics