AI不那么美妙的一面:他们在听你跟Alexa说了什么
人们可能想不到,自己向亚马逊语音助手Alexa询问身上的奇怪皮疹是怎么回事,或者要它关上灯时,还有其他人在听。 要让人工智能变得更聪明,就需要人的输入以及复核。上个月,彭博报道曝光了一个遍及全球的团队,他们的任务是聆听毫无防备的用户向Alexa提出的问题,而且这个AI训练团队有数千人之多。 这些员工听的录音包括人们要Alexa关灯,或者播放泰勒·斯威夫特的歌曲。他们把这些话整理出来,再重新输入Alexa的软件中,让它变得更聪明并且更善于掌握人们说话的方式。 平台Twilio Autopilot供开发者制作机器人程序和Alexa应用,该机构产品和工程部门负责人尼可·阿科斯塔说:“用这种方式训练很正常,而且这是AI不那么美妙的一面。所有语音引擎都需要用真实世界的声音来训练,也就是说,需要有人把这些声音整理出来,以便不间断地训练这种引擎。” 把这样的智能音箱放在家里显然要权衡隐私问题。亚马逊发言人在给《财富》杂志的声明中表示,该公司使用了“随机挑选的一批用户中极小的一部分互动内容”,听录音的亚马逊员工无法借此来辨别用户的身份。 该发言人指出:“比如说,这些信息帮助我们训练我们的语音识别和自然语言解读系统,这样Alexa就能更好地理解您的要求,并确保它的服务对任何人都很有效。我们有严格的技术和操作防范措施,而且对滥用我们这个系统的行为采取零容忍政策。” 网络安全公司Forcepoint首席科学家理查德·福特认为:“原始人类训练数据对保持上述服务的质量来说很‘关键’。” 福特说:“如果想对Alexa进行语音识别训练,最好的训练数据就是实际‘使用中’的情境,其中有背景噪音,有狗叫,有人们在进行交流……也就是大家能在真实世界中见到的所有‘乱糟糟的东西’。” 但他指出,Alexa也有其他训练途径,并不需要偷听数千万人对Alexa的要求。 福特说:“你可以付钱,让人们选择自愿分享数据,或者参加测试。但说到底,用容易操作的方式获得真正的现实数据可能需要捕捉真实世界数据。也许可以采取一些缓解措施来尽量降低隐私风险,但它们并非万无一失。隐私是把好的治理、好的设计以及好的实施融合在一起。” 此前已有人担心把大型科技公司的智能音箱放在家里存在隐私问题,这件事更是加重了他们的顾虑。不过,亚马逊表示Alexa只录下了用户的要求,并在听到“Alexa”或“亚马逊”等唤醒词语后把录音发送到了云端。亚马逊Echo音箱录音时的特征很明显,那就是它顶端的蓝色光圈会亮起来。 以前的录音可以删除。用户可以在Amazon Connect and Devices网站上手动删除自己对Alexa说的所有内容。他们可以在该网站上选择“设备”,也就是亚马逊Echo音箱,然后点击“管理语音录音”。 如果根本就不想做“被蒙在鼓里的”AI训练师,那就可以在亚马逊Alexa App上点击左上角的菜单按钮,然后选择“Alexa账号”和“Alexa隐私”。再点击“管理您的数据如何改善Alexa”,然后关闭“帮助开发新功能”和“用短信来改善对语音的整理”选项。这样亚马逊就无法用原始录音来训练它的软件了。 当然,如果选择隐私的人过多,提高AI的自然语言理解能力所花费的时间就会变得长得多。福特说:“在不使用真实数据的情况下构建这样的语言素材库真的很难,正因为这样,才会出现从实际使用中收集数据的真切需求。要想按时并且高效地交付产品,这会成为一个很大的难题。”(财富中文网) 作者:Alyssa Newcomb 译者:Charlie 审校:夏林 |
When users ask Alexa about their mysterious rash, or to turn off the lights, they might not expect someone else to be listening. A.I. needs human input—and human reviewers—to become smarter. This week, a Bloomberg report pulled back the curtain on the team of people around the world who are tasked with listening to the Alexa queries of unsuspecting users. And the A.I. training team’s members number in the thousands. The employees listen to recordings of people asking for Alexa to turn off the lights or play Taylor Swift. They transcribe the queries and feed them back to the Alexa software, making it smarter and more adept at grasping the way humans speak. “It is normal to train this way, and a less sexy side of A.I.,” said Nico Acosta, director of product and engineering at Twilio Autopilot, a platform that allows developers to build bots and Alexa apps. “All speech engines need to be trained on real world audio, which implies the need to have a human transcribe it to continuously train the engine.” There’s a clear privacy trade-off in having these smart speakers in your home. In a statement to Fortune, an Amazon spokesperson said the company uses “an extremely small number of interactions from a random set of customers,” who are not identifiable to the employees who are listening. “For example, this information helps us train our speech recognition and natural language understanding systems, so Alexa can better understand your requests, and ensure the service works well for everyone,” the spokesperson said. “We have strict technical and operational safeguards, and have a zero tolerance policy for the abuse of our system.” Raw human training data is “critical” when it comes to keeping the quality of the service, said Richard Ford, chief scientist at cybersecurity firm Forcepoint. “If you want to do voice recognition for Alexa, the best data to train it on is on actual ‘as used’ scenarios, where there’s background noise, dogs barking, people changing their minds… all the ‘mess’ that you find in the real world,” said Ford. However, there are other ways Amazon could train Alexa without eavesdropping on tens of millions of queries, he said. “You could pay people to opt in to share their data willingly, or take part in trials, but at the end of the day, getting truly realistic data in a tractable way probably involves capturing real world data,” he said. “There are mitigations you can potentially put in place to minimize the privacy risks, but they are not infallible. Privacy is a confluence of good governance, good design, and good implementation.” While the story has added to the concerns of people who are already worried about the privacy issues involved with allowing a tech giant’s smart speaker to live in their home, Amazon said its speaker only records queries and sends them to the cloud after it hears its wake word, such as “Alexa” or “Amazon.” A clear sign that the Echo speaker is recording: the device’s blue ring lights up. There are ways to get rid of old recordings. Users can manually delete everything they’ve ever asked Amazon Alexa by visiting the Amazon Connect and Devices website. Once there, select “devices,” the Amazon Echo, and then “manage voice recordings.” To opt out of being an unwitting A.I. trainer altogether, in the Amazon Alexa app, tap the menu button in the upper left corner of the screen. Then select “Alexa Account” and “Alexa Privacy.” Choose “Manage how your data improves Alexa.” Next, click off the buttons next to “Help Develop New Features” and “Use Messages to Improve Transcriptions.” The settings will keep Amazon from using raw recordings to train its software. Of course, if too many people opt for privacy, the A.I. will take a lot longer to improve its understanding of natural language. “Getting such a corpus is really hard without using real data, which is why there’s often a genuine need to collect data from actual usage,” said Ford. “If you’re going to deliver your product on time and with a high efficacy, it’s a hard problem.” |