Does this mean anything to you?

In the past, I have discussed strange sentences like garden path sentences and Moses illusions. These sentences, while strange or inaccurate in some way, are ultimately grammatical. I find sentences like these particularly fascinating because they push the limits of what our brains will tolerate while reading. There is another type of sentence I encountered recently that seems innocuous at first, but as I researched more I was blown away by how widely studied this particular sentence construction is. For an example, take a look at this tweet from Dan Rather from back in 2019 and see if you can notice what is going on.

This sentence makes zero sense. If you read it quickly though and don’t overthink it, you might just convince yourself that it does! Look at the replies for instance. The overwhelming majority of the replies have completely missed the fact that this sentence is poorly formed and have assumed that the main point of this sentence is that the presidents English is not very good. This tweet is attempting to form a comparative between two groups of people in different dimensions of comparison, making it ungrammatical.

To visualize this, let’s break it down into two halves. “I think there are more candidates on stage who speak Spanish more fluently…” – Okay, so here he is setting up a comparison where we are looking at a quantity of people. “…than our president speaks English.” – but he finishes the statement off with a statement regarding how well the president at the time could speak English. He is trying to compare a number of Spanish speaking people to the fluency of another’s English. Even as I write this, I am having a hard time describing exactly what the sentence is “trying” to say, but I think we can all agree that it makes no sense whatsoever.

What exactly are sentences like this called? These sentences are called comparative illusions (the sciency name) or Escher sentences (the more fun name). The name Escher sentence comes from the famous artist M.C. Escher, who’s famous Penrose stairs illusion involves a staircase that looks normal on the surface, but ultimately go no where and cannot function like a normal set of stairs. Honestly, if there were an award for naming things, this would absolutely win because I cannot think of a better fitting description than that. These sentences seem completely fine, until they’re not and then you are just left to stand back and wonder what the heck is going on. The stereotypical example used widely for this phenomenon is a little easier to spot:

  • More people have been to France than I have.

Again, we see a sentence that is trying to compare two separate ideas in a single sentence. In a sentence like this one, we are presented with a set of individuals in the first clause (more people), but when we get to the second clause (I have), we discover that there is no such set of individuals with which to draw a comparison.

To be clear, this is not an issue of plurality in the second clause. The sentence “More people have been to France than we have” is equally awful.

The most striking thing about these sentences is that people will not report any sort of weirdness on first glance. It’s not until you take a longer look and try to determine exactly what is being said that you start to notice what is going on.

So, what exactly is going on then? Well, as is the case with a lot of linguistic weirdness, there is no way for us to be able to know for sure. Some researchers have tried to argue that, like I mentioned above, the sentence is trying to use two templates for comparison which are fine in isolation. It is only when you combine them in the same sentence that things start to go awry.

Think of it like this. Here I will present you with two true and grammatical sentences:

  • John is too tired to drive his car safely.
  • John has driven for as many hours as Tim has.

These sentences both being true does not allow you to blend them together into a third sentence like so:

  • John is too tired as Tim has.

Now this is just a bad sentence and not an Escher sentence in the slightest. Let’s try to apply this same sort of logic to an Escher sentence though:

  • More people have gone to France than I could believe.
  • John and Mary have gone to France more than I have.
  • More people have gone to France than I have.

It could also be the case that our brains are noticing that there is deleted material at the end of this construction that we expect to be there because we see sentences like that all the time. Take this sentence for example:

  • Sally ate some pizza and Amanda did too.

This sentence is completely fine and our brains don’t struggle with it at all because we can infer that the “did too” in this case means that Amanda also ate some pizza. So with an Escher sentence, when we encounter the end of it “… than I have.” our brain might just be saying “oh I can fill in the blanks here” and because we have some working examples to reference, we think it all makes sense and we call it close enough.

All of these theories have been hard to prove in the past and there hasn’t really been a concrete solution as to why we are seemingly unperturbed by these horrible sentences. I’m curious to know what other people think of these so if you have any theories about them, or if you want to try to convince me that they are ultimately fine, let me know down below.

Thank you for reading folks! I hope this was informative and interesting to you. Be sure to come back next week for more interesting linguistic insights. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.


Gendered nouns

Photo by Magda Ehlers on

I received a comment a few weeks ago asking me why languages like French and Italian have gendered nouns. This is an issue that has been debated a lot in the literature, and there isn’t really a good answer at this point. But putting the why issue aside for now, let’s go on a quick tour of grammatical gender.

Before we get to gender though, we first need to talk about noun classes. Noun classes are additional ways of categorizing nouns based on factors like gender, animacy, and even shape. Grammatical gender is a subset of noun classes that focuses on, well, gender. Languages will typically denote different noun classes of words by adding prefixes and suffixes to them, which are also shared with other words in a sentence like verbs and adjectives. Noun classes are not universal to all languages though which is why a language like English does not do anything like this.

Since we can’t use English examples to talk about noun classes, we will instead use Shona. Shona is a language in the Bantu family that is spoken in Zimbabwe. This is a go-to language for the purposes of talking about noun classes because it has a total of 21 different noun classes. That is not a typo, there are really that many distinctions that they make. It is not as complicated as it might sound though once you look at their paradigm and start to notice a pattern.

Table from Wikipedia

According to the table above, noun class 1 is for human nouns while noun class 2 is for human nouns that are also plural. Noun class 20 has been excluded from this table because it is considered vulgar. You can see this pattern repeat on all the noun classes where they will have one class for singular nouns and a second class for plural nouns. You will also see here that some of the prefixes are reused multiple times (I guess they didn’t want to make it TOO complicated). Ultimately, we can see that “boy”, “tree”, “house”, “scorpion”, and “river” all get different prefixes or changes to the beginning of the word because of the noun class they belong to.

While this is all really complicated and interesting, you may be asking yourself at this point “why did a system like this develop?” For a language like Shona, these noun class markers not only show up on the nouns, but they will also show up on the verbs and other words in the sentence as a way of either marking the importance of a concept or showing important links between concepts. Let’s take a look at an example.

In Shona, if you have the phrase “Pachikoro panotamba vana” it translates to “at school the children play”. Looking at these words individually, the root word “chikoro” (school) is paired with the “pa-“ prefix (noun class 16) and the word “vana” (children) is composed of the root word “ana” with a “va-“ prefix (noun class 2). The word in the middle “notamba” (play) takes same prefix as school. Basically, what this sentence is trying to say is that the children are playing at the school and not at the playground because we are emphasizing this link between the playing and the school.

Using the same word order and changing the prefix on the playing verb to become “Pachikoro vanotamba vana” we are instead drawing the link between the playing event and the children. Essentially this would be like saying that the children are playing at the school, not the adults.

This is just a quick example, but you can imagine how this would scale up if you had a long sentence with multiple nouns and embedded clauses. Having the ability to draw direct links from one word to the next without changing the ordering of the words can be pretty efficient. This is the trade-off that you get in languages with complex noun class systems. You can be very flexible with the word order.

To make this same meaning difference in English we need to do things like change the word order, put verbal emphasis on certain parts, or add extra context to clarify what we mean. In Shona, all they need to do is change a prefix on one word.

Grammatical gender is a much simpler version of this and is typically just limited to two or three classes. Everyone knows the common examples of French, Italian and Spanish that have two genders in their grammar (masculine and feminine), but some often forgotten members of this discussion are languages like German and Romanian which make a three-way gender distinction on nouns (masculine, feminine and neuter). 

In addition to the masculine/feminine divide, some languages simply make an animate/inanimate distinction, such as Ojibwe (an indigenous language spoken in parts of Canada and the United States), as well as a common/neuter distinction like we see in Swedish and Danish.

Swedish and Danish used to have a three-way distinction on their nouns like German does, but the masculine and feminine distinction melded together into this common category and only the neuter distinction is still observed.

But I think the real point of the question being asked (if I am interpreting it correctly at least) is why do the French have masculine sofas and feminine tables? We have seen the pattern evolve in multiple languages from all different families so it’s not like we can even just trace it back to some weird thing they decided to do in Latin.

I mean I was able to tell you all of these amazing things that some languages do with gender and noun classes still. From the simple two-way distinctions to the incredibly detailed 21-way distinction, there is clearly a lot to explore and so much that I glossed over still. All of these different systems evolved independently too! Shona and the Romance languages are about as related as chickens and dogs in terms of language lineage, yet they each independently arrived on a system like this. Contrast this with English, which is quite closely related to German, yet English has relatively little gender marking (we have pronouns and pairs of nouns like “king” and “queen” that are inherently gendered), but German makes the three-way contrast I talked about earlier.

There have been theories proposed in the past that this gender marking in some languages may be linked to an evolution of the animate/inanimate distinction. But this raises a few questions. If this were the answer, why do we still see animate/inanimate distinction? Would they not have all evolved to gender distinctions? And better yet, why does Shona take things to the max and include so many classes rather than simplifying it down?

Other explanations have tried to draw a link between humans’ perception of gender and sex and how we view the world. This theory does make a bit of sense when you observe the fact that languages which make gender distinctions will never have things like “man” or “male dog” with feminine morphology. These are items that we would call naturally gendered. It really starts to fall off when you look at tables and sofas though. What feature of a table “exudes” feminine energy?

When it comes to new words entering French specifically, we can look to Académie française for guidance! Académie française are the official council in charge of maintaining the “purity” of the French language, which essentially means they oversee the addition of new words to the French dictionary while also dictating the pronunciation of words for “formal French” usage. Even their rules are not definitive though since there have been cases over the years of the gender of some nouns changing over the years based on usage popularity.

So why does gender matter in some languages? The answer to that one is mostly just a big shrug at this point. It is probably the least satisfying conclusion I will ever write here, but it is very true. Ultimately, there is no satisfying answer as to “WHY” this happens. What we have instead is a ton of variation and things to talk about when researching languages, which is the kind of stuff that (ideally) keeps people like me busy forever.

I suppose the moral of the story is that language is a complicated thing which can often act like a living organism. Language change is complicated, but absolutely worth talking about in a future post. For now, I am sorry that I couldn’t give a satisfying answer to the original question. I just hope that showing the various systems that different languages use was still interesting enough to read about.

Thank you for reading folks! I hope this was informative and interesting to you. Be sure to come back next week for more interesting linguistic insights. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.

Voice is stored in the vocal folds

I have talked a lot about voiced sounds in previous posts, but I have been ignoring one cool thing that you can do to trick your brain into hearing something unexpected. Before we can get to that though, I need to teach you about voice onset time.

Voice onset time (or VOT) is a phonetic measurement of how long it takes for voicing to start after a stop is released. When you are articulating a stop, you are using either your lips (for [p] and [b]) or your tongue (for all other stops like [t] [d] [k] and [g]) to completely cut off the flow of air momentarily in speech (hence the name stop). The stoppage of airflow in a stop occurs at the beginning of the sound which leads to an air pressure buildup behind the lips or tongue that is audibly released when the sound is produced (this is called the release burst). If you have a word with a stop at the beginning of it like “dog”, we can measure the amount of time between when the stop is released (when air starts flowing again) and when the vocal folds start vibrating again. This does require using some audio analysis software (Praat) to see clearly, but here is what it looks when I say “dog”.


You can see on the left side of this wave form where there is no sound. This is because my tongue is placed against my alveolar ridge and there is no air or sound coming out yet. The dark black line at the left edge of the highlighted region (in pink) is the point where my tongue releases from that position and the air begins to flow out producing sound. This is not when the vocal folds start vibrating though. That point comes approximately 15 milliseconds later (shown at the right side of the highlighted area).

This 15 millisecond VOT is slightly higher than average. The average VOT for voiced stops like “d” in English is anywhere from 0-10 milliseconds.

Now take a look at this recording of me saying the word “tag”.


Visually, you can see that the VOT for “tag” is much larger coming in at about 103 milliseconds. Again, this is higher than the expected average of about 30-40 milliseconds, but we can chalk this up to the productions being recorded in isolation with careful and purposeful speech.

If I record the two words together in a spoken sentence like “the soldier is wearing dog tags” it becomes a little closer to the expected averages as seen in this third image here.

“The soldier is wearing dog tags”

What is making the VOT so much larger for a “t” compared to a “d”? The “t” sound in English is a voiceless stop meaning that the consonant itself is articulated with the vocal folds spread so they do not vibrate. So, when we are measuring the VOT of a voiceless sound, we are measuring the time from when the stop is released to the beginning of the voicing from the adjacent vowel (all vowels are voiced). Contrast this with a “d” sound which is voiced stop meaning that the vocal folds are pressed together so that they vibrate during the actual consonant sound. With a voiced stop, we are measuring the time from when the stop is released until the vocal folds begin vibrating because of the consonant itself. We always expect voiceless stops to have a larger VOT than voiced stops for this reason and this is universal for all languages.

So is that all there is to VOT in English? It turns out that we have two types of voiceless stops in English. The one that you get will depend on the surrounding environment. Let’s do a little demo and you will see why I mean.

Place your hand in front of your mouth and say the word “stop”, and then, with your hand still in place, say the word “top” (Pandemic note: masks do interfere with this demonstration and will need to be removed for full effect). You will feel that when you say the word “top”, there is a significant puff of air that hits your hand compared to when you say the word “stop”. The burst of air is called aspiration, and in English, voiceless stops that appear at the beginning of a stressed syllable will have aspiration if they are the first sound. When we transcribe these aspirated stops in the International Phonetic Alphabet, we will use a superscript “h” to denote this aspiration [th].

In a word like “stop”, we have a voiceless unaspirated stop and these will have a VOT that is shorter than the aspirated version, but still longer than voiced stops like a “d” sound. Taking a look at one final recording of mine, when I say the word “stop” my VOT comes out at about 23 milliseconds.


For English speakers, we don’t care about this distinction between aspirated and unaspirated voiceless stops. Both of these are just considered “t” sounds in our language. This is evidenced by the fact that you probably didn’t know about this difference. More importantly, if you say the word stop, but you put a lot of effort into really making sure that you get as much aspiration as possible on the “t”, it is still just the word “stop”. Nothing will change about the meaning of it.

This is not the case for all languages. Let’s take Armenian for instance. Armenian has a three way stop contrast where using a voiced stop, a voiceless unaspirated stop, or a voiceless aspirated stop in a word can change the meaning of it. An example from a 2003 paper states that the word transcribed as [baɹi] (the upside-down r is just a regular “r” sound) means ‘good’, the word [paɹi] means ‘dance’ while the word [phaɹi] refers to the first fruit that a tree bears. (Hacopian, N. (2003). A three-way VOT contrast in final position: data from Armenian. Journal of the International Phonetic Association, 33(1), 51–80.

I think this is a really cool distinction that shows just how important language really is. Something as simple as how much aspiration you use to say a word can have a huge impact on the meaning of it. There are even some languages that will go the extra mile on their voiced stops and have what is referred to as pre-voicing. Pre-voicing means that the vocal folds will start vibrating before the stop articulation is released meaning that the VOT of that sound will end up being negative. This is a phenomenon that is observed in some Southern African languages such as Taa and !Kung.

And now for one last cool thing I can show you before I close out. Check out this video of a quick auditory illusion.

This is again, me saying the word “stop”. But if you notice from the second time it is played, when the “s” part of it is cut off, it sounds a little bit like a mixture between “top” and “dop”. This confusion you may be experiencing comes from the fact that a voiceless unaspirated stop is closer in VOT to a voiced stop than it is to a voiceless aspirated stop that we would expect at the beginning of the word “top”. Our brain wants to hear the word “top” and will ultimately recognize it as such, but there is this brief moment of ‘wot in tarnation’ that our brains go through first because we are pretty sure that might be a ”d”, even though “dop” is not a word.

Anyway, I am going long again, like usual. Thank you so much for sticking with this long-winded post. I hope this was informative and interesting to you. Be sure to come back next week for more interesting linguistic insights. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.


Normally I try to talk about lighthearted topics so I can throw in the occasional pun and keep this whole thing entertaining for myself. I won’t be doing that today, but I do feel that this topic is incredibly important, and more people should know about it. Today I will be talking about acquired aphasia.

Aphasia is a language disorder that is caused by damage to the language centres of the brain (usually in the left hemisphere for right-handed people). This damage is typically the result of a stroke but can also be caused by a severe blow to the head or some other type of traumatic brain injury. Aphasia is an incredibly complex disorder that can present many different symptoms.

The important thing to be aware of from the outset is that aphasia is a language disorder and not a cognitive impairment. This is a hurtful misconception and a stigma that aphasia sufferers and their advocates are working very hard to overcome daily. If there is only one thing that you take from this post, let it be this.

In terms of the types of aphasia, this is largely dependent on where in the brain the damage is. Let’s start by talking about two major brain areas related to language. These areas are Wernicke’s area and Broca’s area (named after the physicians that discovered them). Wernicke’s area is located in the temporal lobe of the language dominant hemisphere (in orange) and Broca’s area is located in the lower part of the frontal lobe (in blue).

Wernicke’s area in the brain is associated with language comprehension meaning that damage to this area often results in nonsense speech being produced and their general unawareness that they are producing things that do not make any sense. You can see this is in the following video clip where a person with fluent aphasia is producing a lot speech effortlessly, but it doesn’t make any sense at all.

Fluent Aphasia (from tactustherapy)

You will also notice that this man does not seem to know he is not making sense. What is perhaps most interesting though comes at the end of the video. When the speech pathologist working with him tells him that their session is over, he produces a very coherent statement that is on topic. This is a good example of a frozen phrase, which is something that is basically an automatic response. Imagine you see someone sneeze (I know, terrifying nowadays) and you instinctively say, “bless you” or “gesundheit”. Those could be considered frozen phrases because they are automatic things that we will say without having to think about them.

When the man reciprocated the “thank you very much” at the end of the video, it gives you a glimpse into his true ability to communicate. All the words and the phrases are still there, the aphasia is just jumbling things up for him.

This type of aphasia is known as Wernicke’s aphasia or fluent aphasia. The name fluent aphasia comes from the fact that the flow of speech is not disrupted at all. People with this type of aphasia are not aware that they are making the speech errors either so it can be very tricky to interpret their speech a lot of the time.

Broca’s area is associated with language production so any damage here will result in word finding difficulties and disfluent speech. In line with the naming conventions from before, this type of aphasia is known as Broca’s aphasia or non-fluent aphasia. To get a sense of what this might look like, take a look at the video here of a person with non-fluent aphasia.

Non-fluent Aphaisa (from tactustherapy)

You will notice that a lot of the speech produced is lacking function words (the, to, is, etc.) and consists mostly of just nouns and verbs. This is known as telegraphic speech, and it is one of the most obvious symptoms of non-fluent aphasia.

The brain is a complex and sensitive organ, and unfortunately this isn’t always a good thing. It is seldom the case that any damage due to stroke or other injury will be isolated to only a single area. It is quite common for patients to experience damage in both Broca’s and Wernicke’s area, as well as in various other areas of the brain. This is why you often see people who have movement difficulties or even complete paralysis on one side of their body in addition to the possibility of these language disorders after having a stroke.

This can also mean that people can suffer from a combination of these two types of aphasia depending on the severity of the damage in each of the areas. The extreme form of this is called global aphasia which means that the person is non-fluent, is unable to comprehend any spoken messages, and is unable to repeat any words or phrases.

Just take a moment to imagine how hard it would be to go about your daily life like this. Especially remembering the fact that this is not a cognitive issue. It is not the case that you have forgotten these words, you just are unable to say the things that you want to. The amount of frustration you would experience is not something I can even begin to fathom.

But this is where linguists are here to help! Speech language pathologists (to be more specific) are trained professionals who work with a variety of people that need language help. Their clients could be kids with speech impediments or recent immigrants who want to improve their accent and sound more “native-speaker-like”. Speech pathologists will also work with recovering stroke patients. They work one-on-one with patients to develop specific care plans and assistance to improve their communication abilities. This can include many things from exercises to improve phonation time (producing an “ahhh” sound for an extended period) to practice on word recognition and repetition.

Having access to speech pathologists is an important part of stroke recovery. There has been research which shows that patients who receive speech therapy intervention after a stroke will recover significantly more language ability compared to those who receive no intervention. Stroke recovery is never an easy thing. Even with speech therapy, many patients will never fully recover their language use ability to pre-stroke levels.

I know this is a little bit of a depressing topic, but I feel that this is important. It feels weird to think about, but the whole reason I even discovered linguistics in the first place was my desire to become a speech pathologist after watching someone I knew recover from a stroke. My career path has changed since then, but I will always remember my roots and the whole reason I am here.

Knowing the signs of a stroke is important because the best chance you can give yourself when it comes to having a stroke is to seek medical attention as soon as possible. The acronym you can remember for this is F.A.S.T.

F = Face Drooping – Does one side of the face droop or is it numb? Ask the person to smile. Is the person’s smile uneven?

A = Arm Weakness – Is one arm weak or numb? Ask the person to raise both arms. Does one arm drift downward?

S = Speech Difficulty – Is speech slurred?

T = Time to call 911

These are the keys to recognizing a stroke and it is a good idea to remind yourself them occasionally. Strokes come out of nowhere with little to no warning, so it is a good idea to know how to recognize the signs of a stroke. You could save a life with this one day.

New York’s Hottest Club

Toward the end of 2021, I had the opportunity to work on a super fun personal project that I have been excited to talk about. I was able to closely examine the speech of Bill Hader over a period of years and directly compare that speech to one of his most famous recurring characters on Saturday Night Live, Stefon Meyers. For those of you that are not familiar with Stefon, check out this video of one of his early appearances on the SNL segment Weekend Update.

If you compare Stefon’s speech to a Bill Hader interview, you can hear there is quite a difference.

First let’s get a little background on Bill and Stefon. Bill Hader is a comedic actor and voice actor originally from Tulsa, Oklahoma. Bill got his first big break in show business when he was hired on as an actor for Saturday Night Live in 2005. While working at SNL, he also worked on many movies such as Cloudy with a Chance of Meatballs, Superbad and Forgetting Sarah Marshall.

During his time at SNL he also worked to develop the character of Stefon with John Mulaney (a former SNL writer, now stand-up comic). The character of Stefon was written after John Mulaney received an email from a club promoter who was trying to entice him into coming out to a hot new night club. The email had ridiculous selling features for the club such as “a room with broken glass”. The character of Stefon was meant to be a heightened version of this that served as a correspondent for Weekend Update who would let the host (Seth Meyers) know about all of the “hottest clubs” in New York City that tourists need to check out.

The voice for Stefon was inspired by a barista at a local coffee shop that Bill would frequently visit. The voice given to Stefon is higher in pitch and is breathier compared to Bill’s speaking voice. The voice also has a very prominent and stereotypical gay lisp to it which is most noticeable on his repeated affirmations “yes yes yes yes yes yes yessssssss”.

All of this is good and fun, but why is this worth looking at and talking about? Well, Bill was on SNL from 2005-2014, and during that time he performed the character of Stefon a total of 14 times (from 2010 to 2014) and an additional two times after leaving the show. This means that, thanks to the power of the internet, we have access to longitudinal data for both Bill (who did several late night interviews during this time) and Stefon. This allows us to compare three things:

1. We can see how consistent Bill was in his performance of Stefon over a period of years

2. We can see whether Bill’s own speaking voice changes over the years

3. If Bill’s voice does change, do those changes affect Stefon, or does he just keep Stefon in this somewhat frozen state?

So, what factors can we look at to compare these two voices? The first factor we can talk about is their pitch. If you recall a few weeks ago, I wrote a quick summary on how we can determine a speakers pitch and the factors that can influence it. The biggest thing to keep in mind here is that the pitch of both Bill and Stefon are coming from the same vocal tract. There is no obvious physical difference we can point to between the two voices like height or gender so any changes in pitch are the result of intentional effort by Bill.

As I already pointed out, you can tell that the pitch of Stefon’s voice is higher than Bill’s own voice simply by listening to his earlier performances. Bill has an average pitch of about 111 Hz while Stefon’s end up being around 133 Hz. But a key observation here is that because we have access to years of recordings, we can look at his performances year by year and spot a trend that might go unnoticed when looking at all the data at once.

Stefon (2018)
Bill Hader (2018)

In this later performance from 2018 (after Bill had not performed the character for 4 years), the pitch of Stefon actually ends up being lower than Bill’s speaking voice! (122 Hz for Stefon compared to 139 Hz for Bill). What is going on to cause this change? Well, I can’t say for certain, but my theory is that the long break between performances likely played a role in this. So, the interesting pattern over time here is that while Bill’s pitch is slightly increasing over time, Stefon’s pitch is going in a downward trend and there is this definitive crossover in the later years of performances.

What else can we talk about in these performances? I mentioned earlier that there is a prominent lisp that Bill gives to Stefon that really draws out his “s” sounds, but how can we quantify this difference. We can do this using simple airflow physics and the concept of centre of gravity. The physics is not super complicated, I promise. What we are measuring here is the energy created by the air flowing through the small space you make in your mouth you make when producing an “s” sound.

You can hear what I mean by doing this quick experiment yourself: Try to say the word “sa” out loud. Now say it again, but this time try to make a big smile and put your tongue as close to the roof of your mouth as you can while still being able to say the word. When you said it the second time, it probably sounded a bit louder on the “s” sounds. If you had recorded these productions and looked at the spectrograms for them, you would be able to see that the second version had higher frequency sounds on it compared to the first one. Something like this:

Save (left) versus Shave (right)

Comparing the production of the word on the left (save) to the one on the right (shave), you can see how the one on the right is much darker at the beginning. This is because the “esh” sound has a lower centre of gravity and less energy is produced overall in that sound compared to an “s” sound.

We can quantify this difference by looking at the centre of gravity of each of these sounds, and from that we can infer differences in how they are articulated. Let’s go ahead and look at how the “s” sounds differ for Bill and Stefon.

Unsurprisingly, the centre of gravity for Stefon (7877 Hz) is a lot higher on the “s” sounds compared to Bill (6051 Hz), but again looking at the change over the years reveals a surprising trend. In 2018, Bill had a centre of gravity of 8244 Hz and Stefon had a centre of gravity of 8373 Hz. This year is much closer than any other year where recordings exist for both Bill and Stefon. Looking at each speaker over time, we see that Bill’s centre of gravity steadily increases over the years while Stefon remains relatively the same.

Comparing the centre of gravity for Bill (blue) and Stefon (orange) over the years

Remember for these sounds too, this is the same “s” sound produced both in character, and out of character. I had to use two completely different sounds to show you a similar scale of difference, but Bill Hader was able to do this using the same speech sound.

The fact that he was able to remain so consistent in how he was articulating sounds as Stefon while his own voice has so much variation in it is quite amazing. This is a true testament to Bill’s talent. Even more so when you remember that he had a long period between 2014 and 2018 where he did not perform as Stefon at all. For him to be able to come back after that long break when his own voice so obviously changed is very cool (to me at least).

So, this is basically just a lot of charts and numbers at the end of the day, but I hope that it was at least cool to see some of the practical applications of acoustic phonetics. While this is a fun and silly example, there are practical applications for these techniques. A great example of that could be legal cases where investigators are trying to match an unknown speaker recording to a suspect. I had a lot of fun doing the research for this and doing the complete writeup for this for course credit and these are the two most interesting findings that came out of this. I likely won’t return to the rest of the findings because it gets a little more technical, but if there is interest in it, I am always happy to reconsider that.

Thank you for reading folks! I hope this was informative and interesting to you. Be sure to come back next week for more interesting linguistic insights. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.

There is snow way that is true!

It is winter here in Canada. As I write this, I am home visiting some family for the holidays which is of course a blessing in these still uncertain times. Here in Alberta the temperatures have dipped down below -40 Celsius at times (which is the same in Freedom Units) with snow falling almost every day. This frigid wasteland has me thinking about a common myth that has been thrown around a lot in the past and I felt that now would be a good time for me to talk about it. But first, I should introduce a few key concepts so you can understand the explanation a bit better.

Photo by Pixabay on

Morphology is a subfield of linguistics that cares about how words are formed and how those words relate to other words in a sentence. It deals with things like affixation and case marking and is often lumped in with syntax because they have a surprising amount of overlap to give us the field of morphosyntax. Affixation is the process of adding supplemental material onto a word to either change the category of it, or enhance it in some other way. This could be something as simple as adding an -s to the end of the word to make it plural, or it could be a complex multistep process to make a word like antidisestablishmentarianism by attaching a whole bunch of affixes before and after the base word, establish.

Case marking is a morphological process in which a noun is “marked” in some way to designate its grammatical role. This could be as simple as designating a noun to be the subject or the object of a sentence. The process of case marking often involves some sort of affixation in some languages but may also result in a completely different word being used in others. It is a little bit tricky to explain using English examples because English does not have a very good case system, but you can see it on personal pronouns. Take the pronoun ‘they’ for instance. In most forms of English, you can say something like “They will see you tomorrow” but you can’t produce a sentence like “*You will see they tomorrow”. Instead of using ‘they’ in the object position, you need to use ‘them’. This is why, when a person is giving you their pronouns to use, they will give you a pairing like ‘they/them’, ‘she/her’, ‘he/him’, etc. because you need to know how to address them as the subject of a sentence and as the object.

In the world of morphology, we would call the first pronoun in the pairing the “nominative” pronoun, and the second one would be the “accusative” pronoun because in English, subjects are assigned nominative case and objects are assigned accusative case. There is so much more to case assignment and so many more types of case that words can have, but that would be a whole post on its own and would require a lot more setup because English does not have a particularly rich case system so I will keep that one in my pocket for a later post. Right now, let’s get back to affixation.

As I mentioned, affixation in English has two basic functions. It can either change the category of a word through a process we call derivation, or it can enhance a word in some way through a process that we call inflection. Beyond affixation though, we can also just combine two words together through a process called compounding. All these processes can happen multiple times to a single word, but the order of them is of the utmost importance.

Keeping on theme with the frigid awful landscape I am stuck in for now, let’s look at snowblowers. What are snowblowers exactly? Simple! They are machines that blow snow. But how is the word snowblowers formed?

Photo by Lauren Hedges on

Let’s start back at the base with the word ‘blow’. The instance of ‘blow’ in this compound word is a verb that refers to some type of wind creating an air current. But the word ‘snowblowers’ is a noun, so we know that at some point, there was a derivational process which changed the category of the word. As it turns out, this happens when we apply our first affix to the word. Adding ‘-er’ to the end of a verb will change its category to a noun and make the definition of it essentially “the thing that does the verb contained within the word”. So really, a blower, is just a thing that blows for the purposes of this compound word.

But how do we know that this was the first word that was attached to this compound? Well, have you ever heard of a snowblow? The fact that my word processer just underlined the snowblow in red when I finished typing it tells me that you probably haven’t. The fact that snowblow doesn’t exist tells us that we need to create the word ‘blower’ before we can add on the word snow. And this is the compounding process in action where we combine the noun ‘snow’ with our newly formed noun ‘blower’ to create the new noun ‘snowblower’.

In this particular compound, we would call ‘blower’ the head of the compound because it is the one doing the majority of the heavy lifting in the definition of the compound. Think about it like this, a snowblower is a thing that moves snow around by blowing it, it is not something that is creating snow to be placed in a pile for movies or things like that (that would be a “snow cannon”).

This is a little difficult to see when we are working with two nouns, but picture a compound like ‘breakwater’ where we a combining the verb ‘break’ with the noun ‘water’. The end result of this combination gives us the noun ‘breakwater’ so we know that the head of the compound is the noun ‘water’ and it is what is dictating the class of the compound.

Back to ‘snowblower’ we now only need to add on the plural ‘-s’ affix to create the word ‘snowblowers’. This is an inflectional affix because it is not changing the category of the word at all, it is still a noun. In other words, we are not changing what the snowblower actually is, there are just several of them now!

All of this information about affixation is just a primer for the real thing that I wanted to talk about today though (which is still snow-adjacent!). Over the years, you may have heard the myth that speakers of the language Inuktitut have a greater appreciation and knowledge of snow because they have 50 (or hundreds according to some) words for snow. This is not actually the case and I can show you that. But when pop-science articles are making these sorts of claims, what do they mean and what are they getting wrong?

First off, let’s talk about Inuktitut. Inuktitut is a language in the Eskimo-Aleut family that is primarily spoken in Northern Canada and is recognized as one of the official languages of Nunavut and the Northwest Territories with approximately 35-40 thousand speakers based on the 2016 census data. Inuktitut is known as a polysynthetic language which lines up quite nicely with what we have been talking about, affixation.

Polysynthetic languages are languages that create words from many morphemes, some of which can stand on their own and some that cannot. Bringing it back to the ‘snowblowers’ example, words like ‘snow’ and ‘blow’ are perfectly fine on their own, but things like ‘-er’ and ‘-s’ need something to attach to and cannot stand alone. Polysynthetic languages go far beyond compounds like this though. They can contain whole utterances and ideas within a single word.

A good example of this is pulled from Wikipedia. In Inuktitut, the word ‘qangatasuukkuvimmuuriaqalaaqtunga’ means “I’ll have to go to the airport”. I did not forget any spaces in the word I promise, it is really all just one word. And the base morpheme of this word is ‘qangata’ which means “to raise/to be raised in the air”. So, through a multistep process of affixation and compounding (which you can see for yourself in the link) Inuktitut speakers are able to take the complex concept of one’s need to go to the airport and contain it within a single word.

Back do the snow question then. Does Inuktitut really have over 50 words for snow? No! They have a lot of words “about” snow… but so does English! In English, we have wet snow, heavy snow, light snow, snow drifts, snow banks, snow angels, and so many other things related to snow. In English we do not have the same ability to create incredibly complex compounds like Inuktitut does so in reality Inuktitut would also have the majority of these snow descriptor type compounds that we have in English as well as the ability to express a complex idea like the verb ‘aputiktatuk’ which means “fetches snow to make water”. This is a word that contains the word “snow” within it, but it is not a word for snow! This word is hardly even about snow at the end of the day!!

A long story made short here is that there is this old idea put forward by some linguists that the words in a language and the way that they are used can determine the way that speakers of the language think and as it turns out, it doesn’t really hold melted snow at the end of the day. This theory is known as linguistic determinism and I think for the sake of time here, I will leave that be for now as a teaser for a future post. But I hope that I was able to adequately show that no, Inuktitut speakers do not have a richer and more in depth understanding of snow because they have “so many words for it”.  The fact that there are so many words related to snow is just a consequence of how their language forms words.

Thank you for reading folks! I hope this was informative and interesting to you. Be sure to come back next week for more interesting linguistic insights. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.

Things are getting tense

Pitch is a word that gets thrown around quite a bit. Some people have high pitched voices and some people have low pitched voices. Have you ever thought about what pitch means though? Sure, we know when someone’s voice sounds high pitched, but what is the reason for that high pitch? You can probably sort the people in your life into “those that have high pitched voices” and “those that have low pitched voices”. This sorting that you are doing in your head may not correspond to typical concepts of gender either. I would wager that you can think of some female friends that have low pitched voices and some male friends that have high pitched voices. There may be another pattern that you notice in these people though. I bet that most of the people in your life with low pitched are taller on average and those with high pitched voices are typically shorter. It turns out there are many factors we can look at that correlate with vocal pitch.

But really, what exactly is pitch? Like I said, we have this idea of what pitch is from just listening to something, but what actually makes something high pitch? Well, those of you with musical training are likely already a few steps ahead of me on this one. The pitch of one’s speech, similar to a musical note, is measured in Hertz (Hz) and corresponds to how ‘quickly’ something is vibrating. But what is doing the vibrating? In speech, it is our vocal folds that are vibrating when we produce vowels and voiced consonant sounds.

Photo by Andrea Piacquadio on

I have talked about the vocal folds before, but I will do a quick refresher here. Your vocal folds are the folds of tissue located in your throat that are responsible for phonation. During the production of a vowel or a voiced consonant, these vocal folds will press together and the air that passes through them at this time will cause them to vibrate producing the sounds that we hear.

Let’s compare these vocal folds to the string of a guitar. In a standard tuning of a six-string guitar, the sixth string will be tuned to E with a frequency of 82.407 Hz. What this means is that when you pluck this string, it will vibrate in a cyclical fashion and should repeat that cycle of vibration approximately 82 times every second. If you were to adjust the tuning peg and tighten the string, the note that you hear would increase in pitch and the frequency of the note would also increase meaning that the string vibrates more times per second. Conversely, if you loosen the string, the note sounds lower and the frequency of the note would also decrease.

Our vocal folds function the same way as a guitar string. The tighter you hold them together and the faster they vibrate, the higher the pitch that you produce. This is how we are able to produce different pitches in our voices as we sing and speak. But wait, I also mentioned that there was a likely correlation of the pitch of one’s voice and their height, right? Let’s go back to the guitar for a second.

Photo by Brett Sayles on

Again, in standard tuning, the sixth string is tuned to E (82.407 Hz). If you place your finger on the fifth fret of the guitar and pluck the string, the note that will come out will be an A with a frequency of 110 Hz. All you are doing by placing your finger on this fret is making the string shorter by a set length to raise the pitch, so from this we can infer that in addition to tension, the length of the string also plays a factor in the pitch.

When it comes to vocal folds, it stands to reason that people who are taller also have larger proportions in most area’s of their body. I mean, you know what they say about people with big feet right… That’s right! They do say that they have longer vocal folds! Now this is not true of every tall person in the world (there are exceptions to almost everything).

Alright so now that we have a better understand of how we quantify someone’s pitch, how can we measure it? Well, thanks to modern technology, we can have software do it for us. A piece of software widely used in the world of linguistics called Praat is used to analyze many aspects of recorded speech including pitch. In the image below, you can see a spectrogram of a recording of me saying the word “fantastic”. On this spectrogram (the bottom half of the image), the blue line represents the pitch tracking that the computer calculates and the average for my pitch ends up being approximately 131 Hz (which is slightly above average for a man my age).

Look at this “fantastic” recording

You will also see that there are gaps in the blue line and there is an explanation for them. Pitch can only be tracked and calculated on voiced consonants and vowels, but many of the consonants in “fantastic” are voiceless meaning that the glottis is spread open when they are produced, and the vocal folds are not vibrating. So, we know that the computer can tell us this number, but how is this number calculated? Let’s zoom in close on one of the vowels in this recording to get a better idea of what is going on.

Another fantastic image

If you look at the waveform here (in the top half of the image), you can see that even though there is a lot of variation in the line, there are patterns that are repeating in it. I have highlighted one of these chunks and you can see that it takes approximately 0.00713 seconds for one of these cycles. To convert this into pitch, we need to figure out how many of these repetitions can happen in one second (that’s what Hz stands for after all!). So if we do some cross multiplication and division like we are in high school, it turns out that this works out to 140.25 Hz, which is very close to what the computer is calculating for this particular vowel (140.4 Hz). Keep in mind that the computer is looking at the entirety of the word while we are just doing the math based on a single cycle. The computer has a complex algorithm that it uses which takes into account several cycles and the surrounding environment, but this is just a quick showcase of how it works.

And before I go on for an eternity, I think we can stop here and call this a solid primer to the mechanics of pitch. I hope that it was informative though and I have enjoyed getting back into the habit of writing like this again. I still have so much to share and I hope that you will come back to learn more. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.

The rules of conversation

Happy New Year everyone. It feels good to get back into a rhythm of writing these posts now that I have a bit of free time on my hands again. I wanted to start off 2022 talking about… talking I guess. Something that we do everyday! The subtle art of conversation.

Photo by mentatdgt on

Conversations can be hard to wrap your head around. If you take a step back and really think about it, the fact that we can have a string of interconnected thoughts and shared ideas with another person is pretty amazing. If I say one thing, that will probably make you think of a different thing and you will say that thing, which will lead to me thinking of a response that I say, and so on until we both run out of things to say!

So, what allows us to be able to talk like this forever? Surely, there must be some sort of insight into how conversations work from a logical standpoint. Well, it turns out that there is a whole field of linguistics that cares about this exact thing known as pragmatics. Pragmatics studies the use of language in social settings as well as caring about the relationship between the interpreter and the interpreted.

A key part of pragmatics is built upon something known as the cooperative principle. The cooperative principle was developed by a linguist named Paul Grice back in the late 1970’s and essentially served to describe how and why people will behave in certain ways while conversing. I wanted to take the time today to talk about this principle, provide some examples, and maybe give you a little bit of insight into why you converse the way you do.

The cooperative principle is divided into four maxims (or rules) which are known as the Gricean maxims. These four maxims are not infallible rules and can certainly be broken in a conversation. The idea is that in a normal conversation where we are actively engaging the other person and trying our best to be clear in communicating our ideas, provided they are doing the same, these are the set of principles that we are both following when speaking our own thoughts and interpreting the others.

The first maxim to talk about is the maxim of quantity. This rule dictates that all utterances should only be as informative as they need to be. In other words, you should aim to make your contribution contain only the necessary information for the situation and not contain superfluous information. An analogy that Grice uses in his book to describe this situation relates this maxim to repairing a car where he says: “If you are assisting me to mend a car, I expect your contribution to be neither more nor less than is required. If, for example, at a particular stage I need four screws, I expect you to hand me four, rather than two or six”.

Automotive analogies aside, how would this play out in an actual conversation? If you are talking with a friend and they ask you “What day are you flying home for Christmas?”, it would be sufficient to say “Thursday, December 23rd”. You could also include the time into this if it was relevant (say, if this friend was picking you up), but it would likely not be necessary to specify the year that this was taking place. Your conversational partner would likely be able to figure out that you are both talking about this years Christmas and stating the information outright would be redundant.

The next maxim is the maxim of quality, which states that the contribution you are making must be truthful. This means that you should not say things which you believe to be false, and you should also not say things which you lack true evidence for.

For a simple and safe example, let’s go back to grade school for a second. Do you remember being a kid, and someone on the playground would tell you something about how they have “an uncle that was on the Titanic” or “a dad that works at Nintendo”? Looking back on it now, it is plain to see that they were obviously lying about something like that, but when you were a kid, you might have believed them at least a little bit. This is the maxim of quality at work!

Those kids that told those tall tales were violating this maxim and providing knowingly false evidence, but you were likely to believe them because of that same maxim. You were just a kid; you didn’t know any better. Why would these other kids be lying to you? Surely their uncle must really have survived being on the Titanic and caught up with his Nintendo employed brother and the pair of them acted as the inspiration for the Mario Bros.! This example really gives you a sense of how these maxims work, and what happens when one party is actively violating them.

The third maxim is the maxim of relevance which simply boils down to the idea that any information you are providing should be free of any irrelevant information. We can return to our flight example for this. If your friend asks you when you are flying home for Christmas, you probably don’t need to tell them what jacket you are planning on wearing to the airport.

Photo by PhotoMIX Company on

This is not to say that this information is not vital at some point in time. For instance, if they are planning on picking you up from the airport like we had discussed earlier, it might be useful to know what colour jacket you have on to find you. But at this point if they are asking you when you are flying home, and the flight is not for a few weeks, telling them about the jacket is not relevant.

The final Gricean maxim is the maxim of manner (or clarity). While the other three maxims care about what is being said, this final maxim is more focussed on how things are being said. To put it simply: be brief, avoid ambiguity, and be orderly.

Having to re-read and edit many of my own posts over the past six months, this is likely the maxim I need to work on the most! If I could say it in as little as 5 words, why am I dragging my feet and adding in all these silly extra words to say it in 50 instead? Or perhaps it is the complexity of my communication which needs to have its intensity decreased for the easement of my fellow linguistics enthusiasts.

This last paragraph sums up violations of the maxim of clarity. It all just boils down to the simple principle that many high school English teachers and writing instructors have been saying for years. Be clear and concise. There is not much else to say on this one (nor should their be).

And that wraps up my first post of 2022. Hopefully now you have a little better insight into how conversations function on a higher level. This one was a bit out of my comfort zone because I don’t work in this area a lot, but I had a lot of fun writing it and thinking about it, so I hope you liked it too! I have been working away now and I have so many interesting things to share with you all now so I hope you will come back next week for more. As always, if you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.

Language acquisition is for babies

IPA Baby by Quinn Goddard (@anquinnomy)

A few weeks ago, I spoke a little bit about some techniques used in infant language research. Someone had asked me to elaborate a bit more on how infants acquire language sounds. I thought the best way to begin this would be to talk about the order that these sounds are acquired before getting into the theories in later weeks.

The ability to acquire language is something that we as humans have innately. Whether it is spoken or signed, we can figure out grammatical rules without having to be explicitly told what they are. The methods that we use are somewhat of a mystery but looking at the typical development pattern of an infant, we can get more of an insight.

In the first six months of an infant’s life, their ability to produce language is essentially non-existent. They will make sounds and begin experiment with babbling as they approach 6 months, but there is not “language” being produced in the same way that adults produce language. What we do have evidence of is the perception of language in the first six months. As I mentioned a few weeks ago, we have evidence that infants can perceive the difference between similar sounds like “b” and “d”. In addition, we can use that same methodology to show that infants prefer the sound of human speech as opposed to non-human sounds.

Newborn infants have the ability to discriminate between almost any contrastive sound in any language, but by about the 6-month mark, their perception begins to narrow and focus on the language that they have been exposed to up to that point. To explain this a bit, let’s imagine a scenario.

In Hindi, one of the languages spoken in India, they have dental variants of the “t” and “d” sounds we use in English. These consonants are produced by making a “t” or a “d” sound with your tongue pressed against the back of your teeth as opposed to by your alveolar ridge. To get a better sense of this, check out this short video.

Now it important to know that in Hindi, these dental sounds are contrastive with the alveolar sound meaning that using a dental “t” instead of an alveolar “t” results in a completely different word being perceived. Think of it like how we make the distinction between the words “bog” and “dog” where the only thing that is changing between those two words is the place of articulation for the first consonant (this is called a minimal pair, and we will talk about them again someday).

The difference between dental “t” and alveolar “t” is likely quite subtle to your ears, and in rapid speech you likely wouldn’t be able to hear the difference. To a Hindi speaker, the contrast is quite clear, and they would be able to notice the difference in almost any situation.

Now to our hypothetical scenario. A newborn infant would be able to notice the difference and react to this subtle change in sound for the first few months of their life. By the time they reach 6 months of age, if they are growing up in an American English-speaking household in California, their ability to distinguish the two is likely completely gone. This is because they don’t need to know this difference. In standard American English, we do not make a contrastive difference between the dental and alveolar sounds like this, so there would be no need for an infant to focus on that feature. All of this is to show that even at such a young age, the infants language capabilities are being refined and fine-tuned to the language they are being exposed to.

Moving into the 6-8 month range is when we start to see the first signs of language production known as the babbling stage. During this stage, infants will start producing consonant vowel pairs like “ba” and “da” mostly in isolation at first. Around the 8-month mark is when they start doing what is known as canonical babbling. This canonical babbling is what we would think of as prototypical baby talk where they will produce repetitive sequences of the same consonant vowel pair (bababababa).

This is usually what we get up until the first year of age when the infant will begin producing single words to refer to objects (juice), actions (up), descriptive adjectives (hot) and social words (yes/no). They will not be crafting sentences at this point but will use single words to communicate the things that they want. If you have ever had children, I am sure you know just how early they acquire the word “NO” and their love for using it. This is the beginning of their process of linking the sounds that they are hearing and producing to meaningful content.

If at this point a child is producing what appears like a two-word phrase (“stop it” for instance), it is likely because they have memorized it as a single word. At this point, the only exposure they have to it is together in a phrase, so they have just memorized it as a chunk. This brings about a whole question of how infants are able to parse individual words based on what they are hearing (which I think is a whole post of its own so I will leave it for now. Trust me, it is very cool though!).

The majority of a child’s language in this one-word stage is monosyllabic and usually involves simpler sounds like “s”, “b” and “k”, but this is not to say that they cannot perceive more difficult contrasts. English speaking children are able to distinguish the difference between a “sh” and an “s” sound because it is an important distinction in their language, even if they cannot produce it themselves.

By around 2 years of age, infants have entered the two-word stage of language acquisition where they (you guessed it) will produce two-word utterances to communicate more complex ideas than they could with just one word. Their word learning rate at this point is quite high and they are learning roughly one word every two hours (provided they are being exposed to language of course. (As a side note, talk to your kids! It is important!!)).

At this point, even though they are using multiple words, they are still missing key markers in their speech such as tense or inflection markers. You won’t see them producing things like “I sat” at this point. They will most likely just default to the present tense “I sit” regardless to the time that they are trying to refer to.

And finally moving to the 2–3-year period of an infant’s life is when we see language truly explode! This is the telegraphic stage of language acquisition where they can string full utterances together, but they still aren’t quite using language to its full capabilities. If you have ever tried to have a conversation with a three-year-old, you probably notice that their speech is not perfect. They still tend to leave out functional words at this point like “is”, “do”, “at”, and so on. This gives their speech the telegraphic quality where we can certainly figure out what they mean, but it feels a little disjointed when you analyze it from afar.

Beyond the three-year period, we can see them start to develop their language further and start filling in these gaps gradually. So, this is the general pattern that we can observe in a typical child’s acquisition of language. As far as the theories of HOW this is working behind the scenes… well, unfortunately I got a bit carried away with this post so I will have to leave that for another week. The process doesn’t have a simple one sentence explanation and there is a lot of ground to cover on that topic alone. I would feel better about giving it its own space rather than trying to rush it here.

Speaking of the future though, now is when I break the bad news that I will be taking a brief hiatus from this blog until the new year. I have been really enjoying writing these posts, but unfortunately my studies are getting quite busy right now and I do not have the time or mental space to write posts that live up to my standards at the moment. Rather than churning out something I am not proud of, I think it is best to take a short break and start the new year fresh. Thank you to everyone who has been reading these week after week and I hope that you will all come back when I post again in the future. In the meantime, if there are any burning questions you have about linguistics, feel free to send me an email. Again, thank you so much for reading, you have no idea what your support means to me. I will see you all in 2022.

Speech perception

A big focus on this blog has been centered around understanding and producing speech, but something that I have ignored up until this point is how speech is perceived. Speech perception is focused on hearing, decoding and interpreting speech. As we will see today, our brains are often not as reliable as we might think.

Photo by shutter_speed on

So rather than just turn this into a lecture about speech perception and the multitude of theories behind it (let’s face it, this is an educational blog, not a university course) I am just going to show off something weird and wild that our brains do and talk a little bit about the mechanics behind it. Alright, so raise your hand if you have heard of the McGurk effect. (Oh wait, sorry. Blog, not lecture)

The McGurk effect is an auditory illusion where certain speech sounds are miscategorized and misheard based on a conflict in what we are hearing versus what we are seeing. We can see this in action by watching the short video below.

So what is actually going on here? The audio that is being played in all three of those clips is exactly the same. You are hearing the same speaker say “ba ba” over and over. But when the audio is played over a video of someone mouthing “da da” or “va va” we are able to hear it as those instead.

Well as it turns out, this illusion provides positive evidence for something called the motor theory of perception. This theory argues that people perceive speech by identifying how the sounds were articulated in the vocal tract as opposed to solely relying on the information that the sound contains.

This motor theory is supported by something like the McGurk effect because we are taking this audio information and supplementing it with what we are visually observing in the video in order to decide what is being said. It also explains why it is easier to hear someone in a crowded or noisy setting if you can look at their mouth and watch them speak as opposed to not being able to see their mouth.

But it’s not as though we are following along with what people are saying by moving our own articulators or imagining how their mouths are moving while we are listening to them. Supporters of the motor theory argue for the process with specialized cells in our brains known as mirror neurons.

A mirror neuron is a specialized neuron in the brain that activates (or fires if you prefer) in two different conditions, it will activate when the individual performs an action and it will also activate when an individual observes another performing the same action. In speech, this would mean the same part of your brain that activates when you move your mouth to produce a “ba” sound will also activate when you watch someone else produce a “ba” sound.

With this knowledge in mind, it should be easier to see why we are able to get something like the McGurk effect to occur. If perception of speech is influenced by visual information, and we are observing someone producing a sound that is activating these mirror neurons, it makes sense that our perceptions might change slightly so that what we are hearing matches what we are seeing.

Photo by meo on

It is important to note that, as I mentioned earlier, this is not the only theory of speech perception that we have right now, and the motor theory is not without its flaws. It relies on a persons ability to produce the sounds themselves. According to the motor theory, if you were unable to produce the sound yourself, and you could not visually see how the speaker was articulating the sound, you should not be able to perceive it.

So what about prelinguistic infants? An infant who has not developed the ability to speak yet should not be able to perceive the difference between a “ba” and a “da” without visual assistance because acoustically these sounds are quite similar.

Some studies have used a novel methodology where the infant will suck on a specialized soother of sorts that will measure the rate at which they are sucking. Using this soother and presenting the infants with audio stimuli through a speaker (no visual input), they have found that presenting infants with new and novel stimuli causes them to suck faster and presenting them with familiar stimuli means that they will suck at a slower rate.

So, by presenting these infants with a series of “ba ba ba” followed by a sudden change to “da da da” will result in an increased sucking rate. These findings are contradictory to the motor theory of speech perception because the infants in this study are too young to speak on their own and their articulators are not refined enough to be able to produce both a “ba” and “da” sound. Because the infants cannot produce these sounds at this point, their mirror neurons would not activate because they would not have developed fully yet.

This is not to say that the motor theory of perception is wrong though. The fact that we are able to perceive the McGurk effect means that their must be some truth to it. It just calls into question whether this theory captures the whole story. This is something that almost every science deals with at some point. There is almost never a perfect explanation or theory that deals with every problem. If you look hard enough, there will be counter evidence to almost any theory, but it becomes a matter of refining theories as we learn more and more about the way that the world works.

There are many other theories of speech perception that have their own explanations and their own problems. I will likely return to discuss some of the other big ones such as Exemplar theory, but for now I think this is a good place to leave this one.

Thank you for reading folks! I hope this was informative and interesting to you. Be sure to come back next week for more interesting linguistic insights. If you have any topics that you want to know more about, please reach out and I will do my best to write about them. In the meantime, remember to speak up and give linguists more data.