Since its demonstration at Google I/O 2018, there has been a lot of buzz about this new technology called Google Duplex. All kinds of people are discussing all kinds of things. But the ones that caught my attention are: Has Google duplex passed the Turing test, What will it cost for businesses to use Google Duplex, Googleâ€™s Duplex AI could kill the call center.
Before we explore some amazing things about Google Duplex, lets first understand What is this new technology about in layman's terms.
What is Google Duplex?
One day, on a weekend meeting with my coworkers, we were planning a surprise party. It was supposed to be a pizza party and we thought of ordering it via phone from dominoz. Then I asked who will make the call. We started pointing at each other, he will call, he will call.... Because nobody among us had ordered pizza via phone before.
Now, Google Duplex is coming to do this type of job for you. I imagine it could have been a lot more interesting had there been Google duplex in my phone. All I had to do was to just tell my phone "Hey Google, order me a pizza from dominoz". And Google will make a call to dominoz just like your office assistant and do the job for you. Watch the video below to understand how this works:
The interesting here is that the system has a natural conversation with the EL cocotero attendant and understands the nuances of the conversation well. For example, even a normal person who had not booked seats via phone before might not be able to answer immediately when asked "How big is the party?" But the system understands the nuance and the context and answers "its for 2 people".
Now you might ask: is the call real? Because the technology is not yet available to public.
How real was Google's demo of Google duplex?
Before we discuss the authenticity of the demo, let's first watch the demo again:
Here the business answers the call with "Hello, How can I help you?". Dont you think it is weird because I have never met with any business who would not identify themselves when contacted.
And second, you did not hear any background noise. However, this is not necessary. Because the Salon attendant might have used mobile phone and may be there was no noise in the background at that time. But a Salon with a phone-booking facility would be professional enough to say like "Hello, I am xyz from abc Salon, How may I help you?"
And yet Sundar Pichai says that the Google assistant made the call to a real Salon!
The call could be edited or staged. We are doubting it still because Google has not given any clarification on this yet.
Another important point here is that the call sounded unbelievably real and natural. The pauses, intonations and sound of "ummm", "ummm hmmm" were just like human.
Sundar Pichai categorically stated that the call was real and the AI system understood the nuances, context and even gracefully handled unexpected situations and confusions.
In that case, we may ask "Has Google just passed the Turing Test?".
The point here is that if Google has indeed developed an AI system that can handle calls just like humans, then it should be considered a major technological breakthrough of the century and hence deserves more than a 2 minutes demo.
Moreover, such a breakthrough needs realtime demo and not a pre-recorded demo. The ideal situation would have been: if a random guy from the press or from the audience was picked to test the system in realtime.
Has Google Duplex passed the Turing Test?
Before we ask this question, let us first have a quick review of what a turing test is:
The test is simple: A human tester(C) interacts with a computer system(A) and human being(B). C does not know A and B. C asks a series of questions to both A and B. If C cannot make out from the interaction that A is a computer and B is a human being, then the computer system(A) is said to have passed the turing test.
Has any computer system passed the turing test till date?
The answer is a big NO. Then you may ask what about Eugene Goostman? There had been a lot of media coverage back in june 2014 that Eugene Goostman passed the legendary turing test. But soon it was disputed by critics and experts alike.
Here is what I think about the whole episode:
The fact that Eugene Goostman is a cleverly coded chatbot and not a supercomputer is cheating and not in the spirit of the turing test. Moreover, the people making the claims in favor of Eugene Goostman are not AI experts or running AI startup or company.
If Eugene Goostman passed the turing test, then I would like to test it now and would like you also to test it yourself. Where is it available? The answer is, it is taken down. It was available for sometime at
princetonai.com/bot/bot.jsp. Perhaps the makers were embarassed with the mismatched hype and sample chats, as they learned more.
Now coming back to google duplex, the answer is same. No. However, google duplex case is not the same as Eugene Goostman.
Unlike Eugene Goostman, which has competed in a number of turing tests, google duplex has not competed in any turing test. Moreover, Eugene Goostman is a chatbot, a piece of software, unlike google duplex which is an AI system.
Here is my final verdict: Google duplex is in the development phase and is not yet released for public testing. So there is no point discussing whether the system has passed the turing test. Moreover, I am particularly skeptical if google takes the turing test seriously in the context of today's AI advancement.
Technologies used in google duplex
Google duplex basically employs the following technologies:
Each of these technologies are extremely hard and challenging problems of AI. These problems come under what is called AI-hard or AI-complete problems. In simple terms, AI-hard means solving any of these problems is equivalent to solving the hard problem of general AI. In other words, making computer as intelligent as human being.
Let me briefly introduce each of these technologies:
- Natural language uderstanding(NLU)
- Text to speech(TTS)
We communicate with computers with what is called computer programming language. And we communicate with each other with our natural language. There has been lot of efforts and research going on for over six decades to make computers understand natural language. Today we have Google Assistant, Apple Siri, Microsoft Cortana, Amazon Alexa, and Samsung Bixby, which can answer some of our questions. But they too are far from understanding natural language in general. You can follow this post for latest updates in this field.
When you were a small child, how did you first learn to identify a dog? You saw somebody pointing to a dog and saying its a dog. You followed it. Then you saw more dogs, played with them and slowly became aware of all the features dogs possess. Here is what you exactly did: unknowingly you built layers of concepts of dogs where each layer of concept or abstraction is built from the preceding layer. So you have a heirarchy of concepts.
The idea of deeplearning in Artificial Intelligence is to emulate this style of learning in machines. You can learn more about this technology @ deeplearning.ai.
It is basically generating speech from text. If you have used apps or site like this, which generate speech from text, you know that the speech does not sound natural and you might have felt awkward listening.
Generating human-like speech has been a challenge. But recently deepmind has shown in this post that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%.
You may also read this post to learn about Tacotron 2, latest advances in this field.
Challenges to building a perfect duplex
A perfect duplex is still along way ahead. Developing a perfect duplex is as hard as solving the hard problem of general AI. As of now, google duplex must be able to overcome the following challenges:
- Conducting natural conversation
- Understanding the context of conversation in general
- Understanding nuances of local dialect
Natural conversation between two people is dynamic. The style of conversation adapts to situations around. What I mean to say is you dont talk the same way when you are high, happy, sad or angry. So natural conversation is very difficult to model.
Listen to one of Elon Musk's talks. He uses complex sentences, lot of fillers, lot of mid-sentence corrections and pauses. But still he can communicate seamlessly and flawlessly.
This is one of the major challenges to building a perfect duplex. There are some other interesting aspects here. It would be worth discussing them only after a stable version of the duplex is released and I have tested it myself in realtime.
Sundar Pichai in the demo has clearly said that the duplex system understands the context of conversation and handles unexpected situations. The question is, how well it understands? Again, lets wait till the release.
Forget about machines, understanding nuances of local dialect is a challenge even for humans if it is first time. Nuance can be fully appreciated only with experience or knowledge of that particular matter. In that case, it will be interesting to see how google duplex fetch in understanding nuances of people.
Can google duplex replace call centers?
If all goes well and the technology comes out exactly as presented, then the answer is a big YES. Because the people working at the call centers are trained in a particular domain. And AI systems we have seen so far performs better in restricted domains.
But google at the moment would be more interested in releasing it to the larger public as part of Google Assistant so that they can collect more data and train their system better.
Google has clarified in a statement to cnet.com that they are not testing the technology for enterprise use cases at the moment. They are focused on consumer use cases and helping people getting things done like restaurant reservations, hair salon booking, and holiday hours.
When will we get to use Google duplex?
The answer is we don't know exactly. Google hasn't announced any official release date.
I suspect it will take some more time(may be a year or so) till we get to use it. Because Google has stated that the duplex technology at the moment is designed to operate in specific use cases with a limited set of trusted testers. And we surely are not among them.
Google is taking a slow and measured approach so that they get the experience right and incorporate learnings and feedbacks from the tests.
That's all for today. If you liked this post, do share it with your friends!