大数据的局限性
阿贝斯曼颇具挑衅性的核心观点是,有一个由事实组成的虚拟物理现象。“事实”遵从既定的规律和轨迹,这取决于它们的界定和衡量方式。“我们每天读新闻时,可能都要面对一个关于我们的世界,与我们自认为了解的状况完全不同的事实,”他写道。“但事实证明,这些日新月异的变化,虽然在我们看来它们发生了真实的相变,但并不意外,也不是随机的。通过应用概率,我们可以理解它们的总体行为方式,但我们也可以通过搜索我们对其认识的速度更慢、有规律的变化,来预测这些变化。事实的快速变化,如同我们看到的其他任何事物一样,有其自身的规则,是可衡量、可预测的。” “可衡量”、“可预测”是什么意思?阿贝斯曼非常擅长描述机构、个人和概率的偏差,这种偏差可以扭曲科学和科学家评估、发布以及消灭“事实”的方式。 “这方面最明显的例子出现在负面结果领域,”阿贝斯曼这样写道。他援引了进化生物学家约翰•梅纳德•史密斯曾经说过的一段话:“统计学是一门让你每年进行20次试验,然后在《自然》杂志(Nature)发布一个错误结果的科学。然而,要是20位独立的科学家分别进行同一项试验,其中的19位将以失败告终,其职业生涯自然也就无法更进一步。这种情形当然令人苦恼,但这就是科学的运行方式。大多数想法和实验都是不成功的。但最重要的是,失败的结果也很少公布。” 问题的关键并非统计科学或科学的统计学存在病理缺陷,而是这种已知的病理缺陷可以创造出动机,让我们重新思考、修改并重新设计我们衡量和测试的事物。我们需要“事实”帮助我们更新我们对于“事实”的思考和理解。科学——以及为其提供驱动和支持的日益数字化的技术——为难以理解自身不断增长的海量数据、无法为这些数据增添价值的企业提供了一个强大的模型。 就这方面而言,《事实的半衰期》是一部入门读本,阐述的是认识论的流行病学,即对于知识和认知性质的理解在一门学科、一种职业或文化中如何传播的过程。阿贝斯曼的工作将敦促世界各地的决策者重新思考一个问题,他们的组织如何将有趣的数据转化为有用的事实。 统计学家、《纽约时报》(The New York Times)网站 FiveThirtyEight博客撰稿人内特•希尔则采用了一种完全不同,但又与阿贝斯曼相互兼容的方式探讨知识、事实和可预见性等问题。通过有些过于繁多的详细例证和插曲,希尔的这部著作就预测的傲慢发出了一组发人深省的警告。希尔这样写道:“这本书讲述的与其说是我们知道的事物,倒不如说是我们知道的事物与我们认为我们知道的事物之间的差异。” 从天气、地震、全球变暖、足球,到次级抵押贷款和全球金融危机,希尔解释了建模者和预报者为什么难以将昨天的数据转化为明天“你可以赌一把”的预测。这些微观案例研究虽然肯定是肤浅的,但并没有回避数学,而且对大多数最重要的假设采取了一以贯之的公正态度。要是本书编辑更优秀一些的话,他或许将督促希尔牺牲数量,撰写更多的深刻见解,但这些例证的广度无可否认地揭示了“预测的病理学”。 |
What do we mean by "measurable" and "predictable?" Arbesman is quite good at describing the institutional, individual and probabilistic biases that skew how both science and scientists assess, publish and extinguish "facts." "The clearest example of this is in the world of negative results," Arbesman writes. He cites evolutionary biologist John Maynard Smith, who noted that "statistics is the science that lets you do twenty experiments a year and publish one false result in Nature. However, if it were one experiment being replicated by twenty separate scientists, nineteen of those would be a bust, with nineteen careers unable to move forward. Annoying, certainly … but that's how science operates. Most ideas and experiments are unsuccessful. But crucially, unsuccessful results are rarely published." The point is not that the science of statistics or the statistics of science are pathologically flawed but that known pathologies and flaws can create incentives to rethink, revise and redesign what we measure and test. We need "facts" to help us renew our insights and understandings about "facts." Science -- and the increasingly digital technologies that both drive and support it -- offers a powerful model for enterprises struggling to make sense of and add value to their growing mountains of data. In that respect, The Half-Life of Facts offers a pop science primer on the epidemiology of epistemology -- that is, the process by which ideas about the nature of knowledge and knowing spread throughout a discipline, a profession and a culture. Arbesman's work challenges decision-makers worldwide to rethink how they want their organizations to turn intriguing data into useful facts. Silver, a statistician who writes the FiveThirtyEight blog for the New York Times site, takes a different but compatible approach to knowledge, fact, and predictability. Almost overstuffed with detailed examples and vignettes, his book delivers a sobering portfolio of warnings about predictive hubris. "This book is less about what we know," Silver writes, "than about the difference between what we know and what we think we know." From weather to earthquakes to global warming to football to subprime mortgages to the global financial crisis, Silver explains how modelers and forecasters struggle to convert yesterday's data into tomorrow's "you can bet on it" predictions. These miniature case studies, while necessarily superficial, don't shy away from the math and consistently take a fair-minded view of the most important assumptions. A better editor might have pushed Silver to sacrifice quantity for keener insight, but the breadth of examples undeniably reveal a "pathology of prediction." |
最新文章