Speech A.I. will hit a wall pretty soon.

If you’re a pessimist you’d say speech A.I. hasn’t built up any speed to collide with this mentioned wall, but let’s give the people that work on Alexa/Siri/Google Assistant/etc. some credit for now.

Linguistics is the study of language. It can be very hard to pin down exactly how to study this phenomenon that seems to be one of humans most distinguished abilities. “Language” consists of a lot of parts that are all quite different from one another and are usually studied independently because the task of combining any two subfields of linguistics is just too daunting. How could we possibly relate the way people from Scotland pronounce the /th/ sound with the length of their pauses between sentences? Language is often studied at certain layers of abstraction, from how our mouths make the sounds specific to our language all the way up to how a sentence relates to the conversation it is in.

source: https://courses.lumenlearning.com/boundless-psychology/chapter/introduction-to-language/

I’ve been fascinated by the different levels of linguistics A.I researchers are studying human speech and natural language processing. It takes a lot of cognition for us to transform some spoken words into a concept that we can internalize in our brain. As of right now, most of the research effort being put into natural language is in the level of syntax, how to construct valid sentences from the words of a certain language. Computers have no idea of what “makes sense” and what “just sounds wrong”. In fact, neither do we! Many experiments have shown that people who speak a language understand the grammar of that language implicitly but not explicitly, just like I know how to ride a bicycle but couldn’t explain to you my muscle movements that make it possible for me to balance. This makes it very difficult to investigate what exactly the rules are for just one language, let alone all languages, and it continues to break down when we study people who can speak more than one language.

Syntax also can be totally separated from meaning. The classical example being the phrase:

“Colorless green ideas sleep furiously”

This makes syntactical sense, but does not convey any idea or hold any meaning. And this shows us how strange human language is. Why are we able to construct meaningless sentences that “sound right” to us? This is already a pretty hard hump to get over, but there is a lot of research being done in this field and it’s moving forward at a pretty solid pace. However, meaning of words and sentence and how they are understood in context is an entirely different and extremely complex problem and we have yet to create some theory or model that explains even a small fraction of the phenomena we see. These are the fields of semantics and pragmatics.

Speech A.I.’s main goal is to create computers that can both understand and produce language that is meaningful to humans. Think Alexa or Google Home. These devices take our human speech as input and attempt to interpret it and then produce speech as an output that we can hopefully interpret. Listening to a question, parsing it for the meaningful elements, finding the answer, and producing a sentence that conveys the answer, requires a lot of statistics and search algorithms and of course some understanding of English syntax, but this scheme that Alexa and Google use is not similar in any way to how humans interpret and produce language.

Here’s a challenge, interpret this sentence:

“You have a green light”

You could have interpreted this a few different ways. To name a couple, it can mean that you are driving and encounter a green traffic light that permits you to proceed, or, you possess a light that is colored green. And even within this sentence, the individual words can having different meanings and functions. Here, “light” is a noun, but elsewhere we can use it as an adjective. We see that the part of speech (noun/verb) is dependent on the context. Also the word “green” can mean colored green, but also in a very global context “green” can mean environmentally friendly. So now we have to compare different contexts, the context of colors a light could be, and the context of implications this light has on our planet.

We rely on context to understand the meaning of a word… but the meaning of the sentence is built out of the meaning of the words.. right? Sentences are made out of words and thus the words of a sentence dictate how the sentence functions, but the meaning of a word can only become unambiguous when put into a sentence… See how confusing this gets? Where does meaning come from? We have entered the realm of pragmatics.

Pragmatics is exactly what speech A.I. researchers are up against. After training Alexa and Siri and Google Home on as much syntactic data as they can, they are about to hit a dead stop when these algorithms are able to break down the structure of any sentence thrown at them but are unable to distinguish if I meant “bat” the wooden stick or “bat” the flying mammal.

I should make it clear that this is not the A.I. researchers’ fault. This imposing challenge comes from linguistics being an extremely young field in science. The study of language in a rigorous, scientific, and analytical way, has only been around since about 1957 when Noam Chomsky published his book Syntactic Structures. This makes linguistics not even 70 years old. And looking at any field of science 70 years in shows that it takes quite a while for huge breakthroughs to show up.

Where am I going with this? Well, unfortunately, this post is about what I can’t show you. The point I’m trying to make is that there is not enough semantic or pragmatic linguistic theory out there for speech A.I. researchers to grab onto and implement. We are about to use up all the syntax knowledge we have so far and all the people studying natural language processing will be grasping as what has not been discovered. However, I do believe that the growth in popularity and practicality in speech A.I. research will kick start new and exciting linguistic research in the fields of semantics and pragmatics.

I think that the state of speech A.I. we are in right now is actually quite interesting. On one hand, we have more data than ever before and the technology to analyze it. And on the other we have a very new field of science that has become incredibly more exciting with all this new data and technology. It is also an interesting in how different this situation is from, say, physics. Theoretical physics has been a distinguished field for centuries and the ideas and theories are always farther ahead than what we can measure in a lab. Math is even farther ahead than theoretical physics. Every time an experimental physicist needs does some work in a lab they consult various theoretical and mathematical ideas to implement and test. But here, we have software engineers and data scientists trying to create something real where there is not sufficient theory. It is a very interesting problem and I believe that the demand and interest in natural language processing will influence where theoretical semantics and pragmatics progresses into the future. The experiments will push the theory, not the other way around.

Leave a comment