立即打开
大数据的最大挑战来自气候变化

大数据的最大挑战来自气候变化

Katherine Noyes 2014年07月07日
研究人员正在利用大数据技术来模拟、解读和演示气候变化对环境的影响。

    谷歌地图引擎(Google Earth Engine)的工程经理瑞贝卡•摩尔介绍道:“我们的目标是助力最好的大数据分析技术,以催生新颖的见解并且促进行动。”谷歌地图旨在将全球的卫星图像进行汇总,其中还包括40年来数以万亿计的观测数据,并将其与其它为研究人员开发的工具一道放在网上。

    比如在全球荒漠化问题上,摩尔表示:“全球荒漠化是气候变化的一个重要推手,直到不久之前,还没有一份详细的实时地图能够显示全球各地的森林情况。但现在情况不同了,去年11月,《科学》(Science)杂志在谷歌地图引擎的帮助下,发布了首张2000至2012年的高分辨率全球森林变化图。

    摩尔介绍道:“我们运行的森林测绘算法是由马里兰大学(University of Maryland)的马特•汉森教授开发的,总共利用了70万张美国陆地资源卫星的图像,加起来大约有20万亿个像素点。它需要超过100万小时的计算时间,但由于我们是在10,000台计算机上并行计算的,因此谷歌地球引擎才得以在几天内就得出了结果。

    如果只用一台计算机计算的话,完成这样一次分析大概需要超过15年的时间。但现在全球各地的任何人都可以在电脑或移动设备上查看这次分析得到的这张互动式全球地图。

    传感器无所不在

    在这些项目取得快速进展的背后离不开这样一个事实:如今我们对数据的收集程度已经远超以往任何时候。

    乔治梅森大学的数据学家柯克•波恩教授指出:“大数据技术在气候研究领域的发展,首先意味着传感器已经无所不在。首先是太空中的遥感卫星,其次是地面上的传感器。”这些传感器时刻记录着地球各地的天气、土地利用、植被、海洋、冰层、降水、干旱、水质等信息以及许多变量。同时它们也在跟踪各种数据之间的关联,比如生物多样性的变化、入侵物种和濒危物种等等。

    在这一类监控项目中有两个比较有代表性的大型项目,一个是美国国家生态观测站网络(NEON),一个是海洋观测计划(OOI)。

    波恩指出:“这些传感器令我们现在正在观测和追踪的气候参数无论在等级还是数量上都有了极大的提高。另外无论是在时间上还是在地理空间上,这些数据对气候变化的覆盖都变得越来越深、越来越广。”

    波恩表示,气候变化是科学建模仿真应用得最广泛的例子之一。科学家不仅利用建模仿真来预测明天的天气,而且还用它来预测几十年甚至几百年后的气候。

    他还表示:“大规模的气候模拟现在每天都在运行,有些甚至可能更为频繁。”这些模拟的水平分辨率更高,达到几百公里,而过去的模拟只能达到几十公里。同时它们垂直分辨率也变得更高,这也就表示可以对大气层中更多的层进行建模。另外还有更高的瞬时分辨率,也就是说只需要几分钟或几个小时就可以进行归零校正,而不是几天或几个星期。

    每天的气候模拟都会生成几千兆字节的数据,并且需要一系列工具进行存储、处理、分析、挖掘和图像化。

    所有模型都是错的,但有些很有用

    气候变化数据的解读可能是最具有挑战性的部分。

    波恩指出:“搞大数据时,要建立一个模型来解释我们在数据中发现的某种关联是很容易的。但我们得记住,这种关联并不代表原因,所以我们需要应用系统化的科学方法。”

    波恩还指出,搞大数据最好要记住统计学家乔治•博克斯的名言:“所有模型都是错的,但有些很有用。”他表示:“这对数字计算机模拟尤为重要,因为其中有很多假设和‘代表了我们的无知的参数’”。

    波恩表示:“要想解决这个问题,以及解决博克斯警告我们的问题,最重要的是做好数据同化。”也就是“把最新最好的观测数据纳入一个真实系统的实时模型中,以对数据进行纠正、调整、确认。通过以不间断的数据同化作为校正措施,大数据在气候预测科学中扮演了至关重要且不可或缺的角色。

    “Our goal is to turbo-charge the best science on massive data to create novel insights and drive action,” said Rebecca Moore, engineering manager for Google Earth Engine. Google Earth Engine aims to bring together the world’s satellite imagery—trillions of scientific measurements dating back almost 40 years—and make it available online along with tools for researchers.

    Global deforestation, for example, “is a significant contributor to climate change, and until recently you could not find a detailed current map of the state of the world’s forests anywhere,” Moore said. That changed last November when Science magazine published the first high-resolution maps of global forest change from 2000 to 2012, powered by Google Earth Engine.

    “We ran forest-mapping algorithms developed by Professor Matt Hansen of University of Maryland on almost 700,000 Landsat satellite images—a total of 20 trillion pixels,” she said. “It required more than one million hours of computation, but because we ran the analysis on 10,000 computers in parallel, Earth Engine was able to produce the results in a matter of days.”

    On a single computer, that analysis would have taken more than 15 years. Anyone in the world can view the resulting interactive global map on a PC or mobile device.

    ‘We have sensors everywhere’

    Rapidly propelling such developments, meanwhile, is the fact that data is being collected today on a larger scale than ever before.

    “Big data in climate first means that we have sensors everywhere: in space, looking down via remote sensing satellites, and on the ground,” said Kirk Borne, a data scientist and professor at George Mason University. Those sensors are continually recording information about weather, land use, vegetation, oceans, ice cover, precipitation, drought, water quality, and many more variables, he said. They are also tracking correlations between datasets: biodiversity changes, invasive species, and at-risk species, for example.

    Two large monitoring projects of this kind are NEON—the National Ecological Observatory Network—andOOI, the Ocean Observatories Initiative.

    “All of these sensors also deliver a vast increase in the rate and the number of climate-related parameters that we are now measuring, monitoring, and tracking,” Borne said. “These data give us increasingly deeper and broader coverage of climate change, both temporally and geospatially.”

    Climate change is one of the largest examples of scientific modeling and simulation, Borne said. Efforts are focused not on tomorrow’s weather but on decades and centuries into the future.

    “Huge climate simulations are now run daily, if not more frequently,” he said. These simulations have increasingly higher horizontal spatial resolution—hundreds of kilometers, versus tens of kilometers in older simulations; higher vertical resolution, referring to the number of atmospheric layers that can be modeled; and higher temporal resolution—zeroing in on minutes or hours as opposed to days or weeks, he added.

    The output of each daily simulation amounts to petabytes of data and requires an assortment of tools for storing, processing, analyzing, visualizing, and mining.

    ‘All models are wrong, but some are useful’

    Interpreting climate change data may be the most challenging part.

    “When working with big data, it is easy to create a model that explains the correlations that we discover in our data,” Borne said. “But we need to remember that correlation does not imply causation, and so we need to apply systematic scientific methodology.”

    It’s also important to heed the maxim that “all models are wrong, but some are useful,” Borne said, quoting statistician George Box. “This is especially critical for numerical computer simulations, where there are so many assumptions and ‘parameterizations of our ignorance.’

    “What fixes that problem—and also addresses Box’s warning—is data assimilation,” Borne said, referring to the process by which “we incorporate the latest and greatest observational data into the current model of a real system in order to correct, adjust, and validate. Big data play a vital and essential role in climate prediction science by providing corrective actions through ongoing data assimilation.”

  • 热读文章
  • 热门视频
活动
扫码打开财富Plus App