苹果Siri善解人意,语音应用引爆在即
它是如何做到的呢?它的工作原理和Siri非常相似:通过从它所识别的语汇中获取意义,然后参照一个医疗信息数据库,将其与病人的病史做比对。随后,它会运用统计推断的方法,在其所发现的信息片段之间建立联系,甚至为对症治疗给出建议。全美大约有45万名内科医生正在使用Nuance公司的软件。佩特罗称,这一技术的准确率超过90%,而且还会随着时间推移而不断提高。显然,这款软件盈利前景良好,因此Nuance公司决定,将其第四财季的收入预期调高约1,000万美元。 然而,研究人员对这一技术的未来抱有更大的希望。思凯普•里佐是南加州大学创新技术学院(the University of Southern California's Institute for Creative Technologies)的助理总监。他正在开发一种互动仿真技术,用以帮助退伍军人针对创伤后紧张症(post-traumatic stress disorder)寻求医疗咨询服务。这款软件名为SimCoach,它的最终目的是要设法理解人们口语背后的情绪状态。里佐称:“这是个十分巨大的挑战。因为必须采集语音模式,然后得像人类的大脑那样对它们进行分析。”里佐称,人类或许能察觉自己的好友或家人情绪异常,因为这时人们的语速往往会变慢,重音也更少,但电脑要捕捉这些信号可就相当困难了。 不过这个领域的有些研究却能更快获得成果,而不用再苦苦等待。去年春天,里佐的研究伙伴——麻省理工学院(MIT)教授阿历克斯•彭特兰在美国银行(Bank of America)的呼叫中心开展了一项类似的语音推断技术试验,旨在分析员工的沟通对业务成功的影响。彭特兰让员工连续六周在脖子上戴着小型电子设备,它们能记录员工的实际位置以及身体语言和声音。所记录的数据能显示这些员工是在和谁沟通,他们站着时与沟通对象距离有多远,谈话的语调如何。彭特兰称:“我们发现,效率最高的员工不光与大量对象交谈,他们还与同样表现出这种特点的同事交谈。”结果,他说,只需要调整一下员工的茶歇时间,使这类员工之间的步调更为同步,这个呼叫中心每年就能节省1,500万美元。 用户现在对Siri的广泛关注很可能将大大促进这类研究,并进一步推进其应用。里佐说:“语音识别技术确实是技术领域至高无上的圣杯(holy grail)。我们已经掌握了这项技术的90%,但剩下的10%要求攻克更多难关。一旦到了引爆点,我们将迎来一片十分广阔的市场。”看起来,这个引爆点很可能非Siri莫属。 译者:清远 |
How? Much like Siri, Nuance's application — which is being used by some 450,000 physicians across the country – extracts meaning from the words it recognizes, referencing a database of medical information and comparing that with the patient's history. It then uses statistical inference to establish a connection between the pieces of information it discovers, even making suggestions about treatment. Petro says the technology is more than 90% accurate and improves over time. It's certainly worked for the bottom line, so much so that Nuance decided to raise its fourth-quarter revenue projections by about $10 million. Researchers have even bigger hopes for the future. Skip Rizzo, associate director of the University of Southern California's Institute for Creative Technologies, is working on an interactive simulation technology designed to help military veterans seek counseling for post-traumatic stress disorder. Dubbed SimCoach, the program will eventually attempt to read the emotion behind spoken words. "It's a big, big challenge. Because what you're doing is having to capture vocal patterns, then you're having to analyze them like a brain does," says Rizzo. While humans may be able to tell when something is wrong with a close friend or family member because their speech pattern is slower or has less emphasis, a computer can have a hard time picking up these signals, Rizzo says. Some research could bring results sooner, rather then later. Last spring, Rizzo's research partner, MIT Professor Alex Pentland, experimented with a similar voice inference technology at a Bank of America (BAC) call center, analyzing how employee communication affected the success of the business. Pentland had employees wear small electronic badges around their necks for six weeks that tracked their physical location and well as body language and voice. The data showed who a person interacted with, how close they were standing to them and the tone of their conversation. "We found that the most productive people were the people that not only talked to lots of people but they talked to co-workers that similarly talked to a lot of people," Pentland says. Simply by changing the employee's coffee break schedule to better coincide with one another, he says the call center would be able to save $15 million a year. The attention consumers are paying to Siri is likely to benefit such research — and push adoption further. "Voice recognition is really the holy grail to technology," Rizzo says. "We're 90% there, but that last 10% is a lot further to handle. And when the tipping point is reached, it's going to be a giant market." It looks like Siri, may very well be that tipping point. |