从草根工程到行业标准:一个开源小项目的进化神话
这是一项英明的决定。卡法雷拉说:“如果没有雅虎和其他公司的大量投资,Hadoop可能不会这么成功。” “没谁拼得过开源产品?” 所以Hadoop借用了一个来自谷歌的点子,把这个概念开源,然后得到了雅虎等大公司的鼓励和投资。但这并不是导致它成功的全部因素。运气——完全没有预想到的市场需求——也在其中起到了关键因素。 卡廷说:“我知道其他人可能会碰到类似的问题,但我不知道居然这么多人都有。我觉得大部分用户都会是文本搜索引擎的开发人员,可没料到许多从事保险业、银行业和石油勘探业的人也会用它——它已经在这些领域得到了应用。” 回首往昔,卡廷说:“我猜我们开展得足够早,作为第一批推动者,我们做的又是开源产品,也付出了大量努力,这一切让我们与许多早期竞争者区分了开来。麦克和我已经研发了很久,不过来自雅虎的几十位工程师又花了好几年时间才让这个架构变得稳定。” 卡廷表示,即便有竞争者想要迎头赶上,“你又怎么能拼得过开源产品呢?和开源产品竞争是非常困难的事——其他所有人都会为它做贡献,他们没有成本。加入他们比对抗他们更容易。” 国际商业机器公司(IBM)、微软(Microsoft)和甲骨文(Oracle)就在那些选择同Hadoop合作的大公司之列。 尽管卡法雷拉并不奇怪网络公司会使用Hadoop,但他表示,他对“这么多人都碰到了12年前极为罕见的数据管理问题”感到震惊。“曾经只有雅虎和谷歌才存在的问题,现在困扰着每一个人。” 信息技术研究公司451 Research的企业软件高级研究员杰伊•莱曼表示,Hadoop代表了“一种开源软件技术的主要推动者的转折点。”在这之前,开源软件比如Linux操作系统,是因为提供了微软Windows这类专有软件之外的合算选择,才声名鹊起。“企业使用它们,大部分都是出于节约成本、提高效益的考量。” 不过,随着非关系型数据库(NoSQL)和Hadoop的出现,莱曼说,“我们看到使用者中出现了有创新之举的推动者。非关系型数据库和Hadoop技术并不真正属于专有技术之外的其他选择。” Hadoop的成功对创造者来说是一种惊喜。卡廷说:“我没有想到一个开源项目能够像这样引领着行业。我太高兴了。” 它仍然发展得如火如荼。卡法雷拉说:“比起最早的组件,Hadoop现在庞大多了。它已经成了一整套工具,而且还在继续扩充。单个的组件也许会遭遇竞争者——主要是MapReduce——但我没有见过能够取代整个Hadoop系统的强大对手。” RedMonk的奥格雷迪说,这个项目的适应性“能够让它不断成功。现在的Hadoop非常与众不同,比起一年或者两年前,它的功能更加强大了。” 不过未来还有许多工作要做。接下来,在Cloudera的支持下,卡廷要开始专注于研究与大数据技术配套的法律政策。 卡廷说:“现在我们有了这项技术,商业和政府的方方面面几乎都已经大幅数字化了,我们也有处理所有这些数据的工具。我们现在需要保证使用它们是出于造福社会的目的。从许多方面看,政策都需要紧跟技术的脚步。” “不管怎样,我们最终都要涉及法律。我们希望它们用在正当的地方。”(财富中文网) 译者:严匡正 |
It was a good decision. “Hadoop would not have become a big success without large investments from Yahoo and other firms,” Cafarella said. ‘How would you compete with open source?’ So Hadoop borrowed an idea from Google, made the concept open source, and both encouraged and got investment from powerhouses like Yahoo. But that wasn’t all that drove its success. Luck—in the form of sheer, unanticipated market demand—also played a key role. “I knew other people would probably have similar problems, but I had no idea just how many other people,” Cutting said. “I thought it would be mostly people building text search engines. I didn’t see it being used by folks in insurance, banking, oil discovery—all these places where it’s being used today.” Looking back, “my conjecture is that we were early enough, and that the combination of being first movers and being open source and being a substantial effort kept there from being a lot of competitors early on,” he said. “Mike and I got so far, but it took tens of engineers from Yahoo several more years to make it stable.” And even if a competitor did manage to catch up, “how would you compete with something open source?” Cutting said. “Competing against open source is a tough game—everybody else is collaborating on it; the cost is zero. It’s easier to join than to fight.” IBM IBM -0.24% , Microsoft MSFT -1.30% , and Oracle ORCL 0.00% are among the large companies that chose to collaborate with Hadoop. Though Cafarella isn’t surprised that Web companies use Hadoop, he is astonished at “how many people now have data management problems that 12 years ago were exceedingly rare,” he said. “Everyone now has the problems that used to belong to just Yahoo and Google.” Hadoop represents “somewhat of a turning point in the primary drivers of open source software technology,” said Jay Lyman, a senior analyst for enterprise software with 451 Research. Before, open source software such as the Linux operating system were best known for offering a cost-effective alternative to proprietary software like Microsoft’s Windows. “Cost savings and efficiency drove much of the enterprise use,” Lyman said. With the advent of NoSQL databases and Hadoop, however, “we saw innovation among the primary drivers of adoption and use,” Lyman said. “When it comes to NoSQL or Hadoop technology, there is not really a proprietary alternative.” Hadoop’s success has come as a pleasant surprise to its creators. “I didn’t expect an open source project would ever take over an industry like this,” Cutting said. “I’m overjoyed.” And it’s still on a roll. “Hadoop is now much bigger than the original components,” Cafarella said. “It’s an entire stack of tools, and the stack keeps growing. Individual components might have some competition—mainly MapReduce—but I don’t see any strong alternative to the overall Hadoop ecosystem.” The project’s adaptability “argues for its continued success,” RedMonk’s O’Grady said. “Hadoop today is a very different, and more versatile, project than it was even a year or two ago.” But there’s plenty of work to be done. Looking ahead, Cutting—with the support of Cloudera—has begun to focus on the policy needed to accommodate big data technology. “Now that we have this technology and so much digitization of just about every aspect of commerce and government and we have these tools to process all this digital data, we need to make sure we’re using it in ways we think are in the interests of society,” he said. “In many ways, the policy needs to catch up with the technology. “One way or other, we are going to end up with laws. We want them to be the right ones.” |