大数据:预知未来的高科技“水晶球”
你可能还记得,塔吉特百货公司(Target)在去年初曾深陷愤怒的舆论漩涡中心。那是因为这家零售商的数据专家们开发出了一种统计方法,可以预测哪些客户有可能在近期怀孕,营销人员向她们推销婴幼儿产品时,就拥有了先人一步的优势。 这个模型很管用:在塔吉特购买孕期及婴幼儿产品的客户增长了30%。但这却引来舆论一片哗然,从《纽约时报》(The New York Times)到福克斯新闻(Fox News),几乎所有人都指责该公司是在“暗中监测”购物者。这场风波好几周后才平息下去。 如果塔吉特成功监测准妈妈这件事已经让你觉得毛骨悚然了,那埃里克•西格尔的新书恐怕会让你惶惶不可终日的。西格尔曾是哥伦比亚大学(Columbia University)的教授,他的公司叫“预测影响”(Predictive Impact),专门开发各类数学模型,这些模型能从海量原始数据中提取出极具价值的信息。各类公司都在使用这些工具进行预测,不管是我们想购买什么东西,还是我们想看什么电影,不管是我们碰上车祸的可能性有多高,还是我们有多大可能会信用卡欠款,都能预测出来。 在《预测分析:预测谁将点击、购买、撒谎或死亡的力量》(Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die)一书中,西格尔用清晰生动的语言解释了这些模型运作的机制及各类误区。简而言之,预测分析,或简称PA,就是一种从经验中学习的科学。从既定人群——客户、病人、即将假释的囚犯、选民或员工——过去和当前的行为数据入手,分析师就能预知他们下一步可能的行为。 这是一种可以预知未来的高科技“水晶球”。西格尔写道,它位居“日益盛行的、越来越依靠数据做决策的趋势”幕后,“实际上,如果一个机构从来不用这种方式充分利用自己的数据,那就和一个人有过目不忘的本事却从来不动脑筋无异”。 这本书列举了丰富的案例,有关花旗集团(Citi)、Facebook、IBM、谷歌公司(Google)、网飞公司(Netflix)、贝宝(PayPal)和其他企业及政府机构利用预测分析的例子比比皆是。比如,辉瑞制药(Pfizer)就有一个预测模型,它能预告病人在三周内对一种给定新药产生药效反应的几率。LinkedIn会用PA来准确找到你希望联系的用户。而在美国国税局(IRS),一套用于过去纳税申报单的数学排序系统“让IRS的分析师在不增加调查的前提下,能发现比以前多25倍的逃税情况。” 还有一个惠普公司(Hewlett-Packard)的案例。几年前,惠普的一些部门每年离职率高达20%,受此触动,惠普决定预测其全球33万名员工中谁最有可能辞职。分析师团队从海量数据入手,如薪酬水平、加薪情况、升迁情况及轮岗情况等,将它们和已离职员工的详细工作经历联系起来开展分析。在他们所发现的数据相似性基础上,研究者们为目前每位员工都打了一个离职风险(Flight Risk)评分。 |
Early last year, you might recall, Target found itself at the center of a storm of outrage. The retailer's number crunchers had come up with a statistical method for predicting which of its customers were most likely to become pregnant in the near future, giving Target's marketers a head start on pitching them baby products. The model worked: Target expanded its customer base for pregnancy and infant-care products by about 30%. But the media brouhaha, with everyone from The New York Times to Fox News accusing the company of "spying" on shoppers, took weeks to die down. If Target's success at setting its sights on potential moms-to-be gives you the creeps, Eric Siegel's new book could ruin your whole day. Siegel is a former Columbia professor whose company, Predictive Impact, builds mathematical models that cull valuable nuggets of data from floods of raw information. Companies use the tools to forecast everything from what we'll shop for, to which movies we'll watch, to how likely we are to be in a car accident or default on our credit cards. In Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, Siegel explains how these models work and where the pitfalls are, in clear, colorful terms. Simply put, predictive analytics, or PA, is the science of learning from experience. Starting with data about the past and current behavior of a given group of people -- whether customers, patients, prison inmates up for parole, voters, or employees -- analysts can predict what they'll probably do next. This kind of high-tech crystal ball is behind "the growing trend to make decisions more 'data driven,'" Siegel writes. "In fact, an organization that doesn't leverage its data in this way is like a person with a photographic memory who never bothers to think." Predictive Analytics is packed with examples of how Citi, Facebook, Ford, IBM, Google, Netflix, PayPal and many other businesses and government agencies have put PA to work. Pfizer, for instance, has a predictive model to foretell the likelihood that a patient will respond to a given new drug within three weeks. LinkedIn uses PA to pinpoint the fellow members you might want as connections. At the IRS, a mathematical ranking system applied to past tax returns "empowered IRS analysts to find 25 times more tax evasion, without increasing the number of investigations." And then there's Hewlett-Packard. A couple of years ago, alarmed by annual turnover rates in some divisions as high as 20%, HP decided to try anticipating which of its 330,000 employees worldwide were most likely to quit. Beginning with reams of data on things like salaries, raises, promotions, and job rotations, a team of analysts correlated that information with detailed employment records of people who had already left. Based on the similarities they found, the researchers assigned each current employee a Flight Risk score. |