立即打开
大数据的预测盲区

大数据的预测盲区

Kurt Wagner 2013-04-28
美国统计学家内特•希尔是个数学天才,长于利用大数据进行预测。去年美国总统大选期间,他非常准确的预测了美国50个州的投票胜负。但他认为,大数据也不是万能的,有些领域的预测成功率就很低,比如地震,比如股市。

    统计学家内特•希尔是个数学天才,却并非因此而出名。他的成名,是因为知道怎样把数学天才运用到真实世界。由于非常准确地预测了美国总统大选的结果,希尔成为全美国最有名的数据达人。他在去年11月份的美国总统大选期间,准确地预言了50个州的投票胜负。2008年,他也猜中了50个州中的49个。同时希尔还把他的大数据分析法应用到了体育【比如美国的大学篮球联赛(March Madness)、职业棒球大联盟等(Major League Baseball)】、赌博(今年夏天他将第三次参加世界扑克系列赛)、甚至是约会。希尔曾经给一个叫Baseball Prospectus的棒球网站写过文章,现在他扩大了涉足的领域。他既是作家,又是政治专家,而且还在《纽约时报》(New York Times)网站上开了自己的博客“FiveThirtyEight”。

    本周四,希尔作为主讲嘉宾在Lithium Technologies公司的年度LiNC大会上做了有关数据分析的演讲。《财富》杂志(Fortune)对他进行了专访,请他谈了谈大数据分析的局限性、大数据分析在股市中的角色、以及它如何应用到约会中的,甚至还请他预测了2016年的美国总统大选。这次专访的文字记录节选如下:

    财富:我相信一直都会有人找你,想让你帮他们赌赢美国大学体育总会(NCAA)“疯狂三月”的比赛。

    内特•希尔:我没有按自己的计算结果来下注,因为我觉得其他人也可能按我的选择下注。如果我按照自己的计算结果买,我已经赢了二等奖了。

    或许你明年可以收一小笔版权费?

    绝对的。或者我们可以先抛出一个假的计算公式,然后晚一点再抛出一个真的。哎呀,上一个里有编码错误!(笑)

    你一开始是用统计学来研究和预测棒球比赛胜负,后来为什么又转向政治了?

    回溯往事的时候,说你当初为什么做了某些事比较容易,但说出来的不一定就是当初推动你往那个方向走的合理动机。不过我认为,当初的部分理由是,我当时为棒球网站Baseball Prospectus工作了五年——从2003年到2008年,这期间我发现棒球行业取得了长足的进步。那个时代刚开始的时候,和电影《点球成金》(Moneyball ,由一本小说改编成的电影)里描写的时代非常像,当时统计学家和传统人士之间的矛盾很紧张。人们担心会有一堆宅男冲出来抢走他们的饭碗。现在情况完全反了过来。事情并不是像你雇了一个统计学家,然后偷偷把他藏在某个地方。而是每支球队——几乎是每支球队,当然也有例外——在它的组织内部的各个级别上都有人懂数据分析。

    我看到统计分析方法在短短几年的时间里进步得很快。而政治报道玩的就是语言艺术。我发现无论是关于政治的新闻报道本身,还是从政治家们嘴里说出来的话,有很多都是在胡扯。所以当时我觉得时机已经成熟了,可以把某些非常基本的分析工具用在关于选举的新闻报道上。

    Statistician Nate Silver isn't famous because he's a mathematical genius. (Although, he is.) Silver's well-known because he knows how to apply his craft to the real world. The country's most popular data cruncher is known for his spot-on election predictions -- he accurately called the winner in all 50 states of November's presidential election; in 2008, he went 49 for 50 -- but Silver's big data analytics have also translated to the worlds of sports (March Madness, Major League Baseball), gambling (Silver will play in his third World series of Poker event this summer), and even dating. Silver once wrote for the baseball website Baseball Prospectus but has since expanded his offerings; he is now a published author, a political pundit, and the creator of his very own New York Times blog, FiveThirtyEight.

    Silver was in San Francisco Thursday to talk analytics as the keynote speaker at Lithium Technologies' annual LiNC Conference. Fortune sat down with him to talk about big data's limitations, its role in the stock market, how it applies to dating, and even his predictions for the 2016 presidential election. A lightly edited transcript follows.

    Fortune: I'm sure you get people coming up to you all the time to discuss how you helped them win their NCAA March Madness pool.

    Nate Silver: I went against my bracket in my own pool because I thought other people would be using it. I would have gotten second place if I had taken my own advice.

    Maybe take a small royalty fee next year?

    Absolutely. Or we need to put out a fake bracket [first], and then put out a real one [later]. Oops, there was a coding error! [Laughs]

    You started out using stats to better understand and predict success in baseball -- why did you move towards politics?

    Of course it's easy to say in retrospect why you did certain things instead of what rational motivations were pushing you in that direction in real time, but I think part was that I was involved working for Baseball Prospectus for about five years -- 2003 to 2008 -- and you saw a great amount of progress in the baseball industry during that time. The start of that era was the era described in [the book-turned movie] Moneyball where you really had a lot of tension between stat-heads and traditionalists. People were terrified that nerds would come over and take their jobs. And really now that's been totally reversed, where it's not just that you have some stat-head that you've hired and have locked into a closet somewhere, but that every team -- almost every team, there are some exceptions -- understands analytics at different levels of the organization.

    But seeing how quickly that progressed in a span of just a few years, and how behind politics coverage seemed to be where it's all about the narrative -- there's a lot of bullshit basically both in the news coverage of politics and from politicians themselves -- so it seemed like it was ripe to apply some very basic analytics tools to the coverage of elections.

热读文章
热门视频
扫描二维码下载财富APP