By Yiting Sun | Sept 14, 2017
When Gang Xu, a 46-year-old Beijing resident, needs to communicate with his Canadian tenant about rent payments or electricity bills, he opens an app called iFlytek Input in his smartphone and taps an icon that looks like a microphone, and then begins talking. The software turns his Chinese verbal messages into English text messages, and sends them to the Canadian tenant. It also translates the tenant’s English text messages into Chinese ones, creating a seamless cycle of bilingual conversation.
In China, over 500 million people use iFlytek Input to overcome obstacles in communication such as the one Xu faces. Some also use it to send text messages through voice commands while driving, or to communicate with a speaker of another Chinese dialect. The app was developed by iFlytek, a Chinese AI company that applies deep learning in a range of fields such as speech recognition, natural-language processing, machine translation, and data mining (see “50 Smartest Companies 2017”).
Court systems use its voice-recognition technology to transcribe lengthy proceedings; business call centers use its voice synthesis technology to generate automated replies; and Didi, a popular Chinese ride-hailing app, also uses iFlytek’s technology to broadcast orders to drivers.
But while some impressive progress in voice recognition and instant translation has enabled Xu to talk with his Canadian tenant, language understanding and translation for machines remains an incredibly challenging task (see “AI’s Language Problem”).
Xu recalls a misunderstanding when he tried to ask his tenant when he would get off work to come sign the lease renewal. But the text message sent by the app was “What time do you go to work today?” In retrospect, he figures that it was probably because of the wording of his question: you’ll work until what time today? “Sometimes, depending on the context, I can’t get my meaning across,” says Xu, who still depends on it for communication.
Xu’s story highlights why it’s so important for a company like iFlytek to gather as much data from real-world interactions as possible. The app, which is free, has been collecting that data since it launched in 2010.
iFlytek’s developer platform, called iFlytek Open Platform, provides voice-based AI technologies to over 400,000 developers in various industries such as smart home and mobile Internet. The company is valued at 80 billion yuan ($12 billion), and has international ambitions, including a subsidiary in the U.S. and an effort to expand into languages other than Chinese. Meanwhile, the company is changing the way many industries such as driving, health care, and education interact with their users in China.
In August, iFlytek launched a voice assistant for drivers called Xiaofeiyu (Little Flying Fish). To ensure safe driving, it has no screen and no buttons. Once connected to the Internet and the driver’s smartphone, it can place calls, play music, look for directions, and search for restaurants through voice commands. Unlike voice assistants intended for homes, Xiaofeiyu was designed to recognize voices in a noisy environment.
Min Chu, the vice president of AISpeech, another Chinese company working on voice-based human-computer interaction technologies, says voice assistants for drivers are in some ways more promising than smart speakers and virtual assistants embedded in smartphones. When the driver’s eyes and hands are occupied, it makes more sense to rely on voice commands. In addition, once drivers become used to getting things done using their voice, the assistant can also become a content provider, recommending entertainment options instead of passively handling requests. This way, a new business model will evolve.
In the health-care industry, although artificial intelligence has the potential to reduce costs and improve patient outcomes, many hospitals are reluctant to take the plunge for fear of disrupting an already strained system that has few doctors but lots of patients.
At the Anhui Provincial Hospital, which is testing a number of trials using AI, voice-based technologies are transforming many aspects of its service. Ten voice assistants in the shape of a robot girl use iFlytek’s technology to greet visitors in the lobby of the outpatient department and offer relief for overworked receptionists. Patients can tell the voice assistant what their symptoms are, and then find out which department can help.
Based on the data collected by the hospital since June, the voice assistant directed patients to the right department 84 percent of the time.
Doctors at the hospital are also using iFlytek to dictate a patient’s vital signs, medications taken, and other bits of information into a mobile app, which then turns everything into written records. The app uses voice print technology as a signature system that cannot be falsified. The app is collecting data that will improve its algorithms over time.
Although voice-based AI techniques are becoming more useful in different scenarios, one fundamental challenge remains: machines do not understand the answers they generate, says Xiaojun Wan, a professor at Peking University who does research in natural-language processing. The AI responds to voice queries by searching for a relevant answer in the vast amount of data it was fed, but it has no real understanding of what it says.
In other words, the natural-language processing technology that powers today’s voice assistants is based on a set of rigid rules, resulting in the kind of misunderstanding Xu went through.
Changing the way machines process language will help companies create voice-based AI devices that will become an integral part of our daily life. “Whoever makes a breakthrough in natural-language processing will enjoy an edge in the market,” says Chu.