首页 500强 活动 榜单 商业 科技 商潮 专题 品牌中心


David Meyer



图片来源:Jakub Porzycki—NurPhoto/Getty Images


OpenEuroLLM项目于本月初启动,预算仅为3740万欧元(约合3860万美元):与其他人工智能相关项目[如美国星际之门人工智能基础设施项目(Stargate AI infrastructure project)首期投入1000亿美元]相比,这一预算显得微不足道。尽管参与该项目的公司,如德国的Aleph Alpha和芬兰的Silo AI等,也投入了等值的研究人员时间,但项目资金的主要来源仍是欧盟委员会。



Aleph Alpha首席研究官亚瑟·贾迪迪(Yasser Jadidi)指出:“大多数享有全球知名度的模型开发工作都侧重于英语。这是由于绝大多数可获取且可访问的互联网文本数据都是英文的,这使得其他语言处于不利地位。”


欧洲最大的私人人工智能实验室Silo AI(该实验室去年被AMD收购,目前正在参与OpenEuroLLM项目)的首席执行官彼得·萨林(Peter Sarlin)表示:"这首先是一个商业问题。无论是阿尔巴尼亚语、芬兰语、瑞典语还是其他语言,是否存在能够在特定的低资源语言中表现出色的模型,从而使该地区的公司能够最终以此为基础构建服务?”

贾迪迪表示,这一问题还对本地语境中人工智能模型的准确性和安全性的评估工作产生了影响。事实上,Aleph Alpha在该项目中的主要作用是提供人工智能模型评估基准(而这套基准并非简单地从英语版本进行机器翻译得来,因为大多数现有的人工智能模型评估基准都沿用了这一做法。)





对于人工智能公司而言,在欧洲开展业务并非易事。除了逐步生效的《人工智能法案》(AI Act)对模型提供商及其客户施加的一系列报告责任之外,还要考虑版权法和竞争法,以及《通用数据保护条例》(GDPR,该条例对人工智能公司可使用的个人数据设定了严格限制)。

这些法律对欧洲人工智能的发展产生了实质性影响,Meta因《通用数据保护条例》的限制而推迟了Meta AI的推出,苹果(Apple)也因未指明的反垄断问题而推迟了Apple Intelligence的部署。(Apple Intelligence将于4月以有限的形式在欧盟地区的iPhone上推出,而Meta已开始向欧洲智能眼镜佩戴者提供部分Meta AI功能。)

就OpenEuroLLM的组织者而言,这些法律挑战是可以克服的。与萨林共同领导该项目的捷克查理大学的扬·哈吉奇(Jan Hajič)说:"我们相信,我们能够遵守所有这些法律规定。”



OpenEuroLLM项目有20个参与者,包括企业、研究机构和芬兰Lumi等高性能计算集群。这样的组合可能被视为一种负担,甚至可能引发优先级上的分歧,但Aleph Alpha的贾迪迪认为,开源项目通常涉及众多的参与者,但这并不意味着项目会因此受到拖累。




OpenEuroLLM项目于本月初启动,预算仅为3740万欧元(约合3860万美元):与其他人工智能相关项目[如美国星际之门人工智能基础设施项目(Stargate AI infrastructure project)首期投入1000亿美元]相比,这一预算显得微不足道。尽管参与该项目的公司,如德国的Aleph Alpha和芬兰的Silo AI等,也投入了等值的研究人员时间,但项目资金的主要来源仍是欧盟委员会。



Aleph Alpha首席研究官亚瑟·贾迪迪(Yasser Jadidi)指出:“大多数享有全球知名度的模型开发工作都侧重于英语。这是由于绝大多数可获取且可访问的互联网文本数据都是英文的,这使得其他语言处于不利地位。”


欧洲最大的私人人工智能实验室Silo AI(该实验室去年被AMD收购,目前正在参与OpenEuroLLM项目)的首席执行官彼得·萨林(Peter Sarlin)表示:"这首先是一个商业问题。无论是阿尔巴尼亚语、芬兰语、瑞典语还是其他语言,是否存在能够在特定的低资源语言中表现出色的模型,从而使该地区的公司能够最终以此为基础构建服务?”

贾迪迪表示,这一问题还对本地语境中人工智能模型的准确性和安全性的评估工作产生了影响。事实上,Aleph Alpha在该项目中的主要作用是提供人工智能模型评估基准(而这套基准并非简单地从英语版本进行机器翻译得来,因为大多数现有的人工智能模型评估基准都沿用了这一做法。)





对于人工智能公司而言,在欧洲开展业务并非易事。除了逐步生效的《人工智能法案》(AI Act)对模型提供商及其客户施加的一系列报告责任之外,还要考虑版权法和竞争法,以及《通用数据保护条例》(GDPR,该条例对人工智能公司可使用的个人数据设定了严格限制)。

这些法律对欧洲人工智能的发展产生了实质性影响,Meta因《通用数据保护条例》的限制而推迟了Meta AI的推出,苹果(Apple)也因未指明的反垄断问题而推迟了Apple Intelligence的部署。(Apple Intelligence将于4月以有限的形式在欧盟地区的iPhone上推出,而Meta已开始向欧洲智能眼镜佩戴者提供部分Meta AI功能。)

就OpenEuroLLM的组织者而言,这些法律挑战是可以克服的。与萨林共同领导该项目的捷克查理大学的扬·哈吉奇(Jan Hajič)说:"我们相信,我们能够遵守所有这些法律规定。”



OpenEuroLLM项目有20个参与者,包括企业、研究机构和芬兰Lumi等高性能计算集群。这样的组合可能被视为一种负担,甚至可能引发优先级上的分歧,但Aleph Alpha的贾迪迪认为,开源项目通常涉及众多的参与者,但这并不意味着项目会因此受到拖累。



An ambitious new AI project has begun to take shape in Europe, with the aim of developing open-source AI models that support the region’s 24 official languages and more—while also complying as much as possible with its thicket of digital legislation.

The OpenEuroLLM project, which commenced work at the start of the month, has a budget of just €37.4 million ($38.6 million): a pittance compared with the sums being invested in other AI-related projects like the $100 billion first tranche of the U.S.’s Stargate AI infrastructure project. Although participating companies such as Germany’s Aleph Alpha and Finland’s Silo AI are also contributing their researchers’ time to an equivalent value, the bulk of the funding comes from the European Commission.

EU-funded projects don’t tend to move fast, and this one has a three-year road map in a sector that’s currently undergoing significant evolution each month. But organizers and participants tell Fortune that it could be possible to deliver an intermediate model within a year—and the effort will be worth it.

Speaking in tongues

“Most model development efforts that have worldwide visibility focus on the English language,” said Yasser Jadidi, chief research officer at Aleph Alpha. “It’s a consequence of most of the internet text data that is available and accessible being in English, and it puts other languages at a disadvantage.”

For people in places like Sweden or Turkey (the OpenEuroLLM project is also targeting the tongues of eight countries that have applied for EU membership, so that the project encompasses a total of 32 languages) the lack of AI models that understand the intricacies of their languages can be a serious problem. For a start, it makes it harder for local companies and public authorities to adopt the technology and start providing new services.

“It’s first and foremost a commercial question,” said Peter Sarlin, the CEO of Silo AI, Europe’s largest private AI lab, which was acquired by AMD last year and is participating in OpenEuroLLM. “Are there models that are performant in that specific low-resource language, be it Albanian or Finnish or Swedish or some other, that allows companies within that region to eventually build services on top?”

The issue also has consequences for evaluating the accuracy and safety of AI models in the local context, Jadidi said. Indeed, Aleph Alpha’s role in the project is chiefly to provide AI-model evaluation benchmarks that aren’t simply machine-translated from English, as most are.

The OpenEuroLLM project may have relatively meager funding, but it isn’t starting from scratch.

Most of its participants have already been involved in a separate scheme called High Performance Language Technologies (HPLT), which started two years ago with a budget of just €6 million. The original proposal was for HPLT to deliver AI models, but then OpenAI’s ChatGPT changed the AI landscape and the organizers pivoted to creating a high-quality dataset that can be used to train multilingual models. The HPLT dataset is currently being “cleaned” of errors, and it will form the basis of OpenEuroLLM’s work.

OpenEuroLLM will create a base model trained on a dataset of all the European languages. Once that’s done, yet another EU-funded project, called LLMs4EU, will fine-tune it for various applications. Apart from cash, the EU is also providing computational resources to all these schemes.

Sticking to the rules

Europe is not the easiest place for AI companies to do business. Quite apart from the AI Act that is gradually coming into force, placing all sorts of reporting responsibilities on model providers and their customers, there’s also copyright and competition law to consider—and the General Data Protection Regulation (GDPR), which places strict limits on the personal data that AI companies can use.

These laws have had real effects on AI’s European progress, with Meta delaying the rollout of Meta AI because of GDPR limits, and Apple also delaying the deployment of Apple Intelligence because of unspecified antitrust issues. (Apple Intelligence will come to EU iPhones in limited form in April, while Meta has started offering some Meta AI features to European wearers of its smart glasses.)

As far as OpenEuroLLM’s organizers are concerned, these laws are manageable. “We believe we can live with all of them,” said Jan Hajič of Charles University in Czechia, who is co-leading the project with Sarlin.

Hajič said the participants had already dealt with the copyright and most privacy issues when developing the HPLT dataset. “The GDPR could be a problem, but that’s something we are trying to get around with pseudonymizing the data, meaning that if we encounter people’s names it gets deleted,” he said, while acknowledging that the necessary automation in this process may not have a 100% success rate.

“Our goal is to do things in such a way that they will not clash with the European regulation in any way,” Hajič said, adding that this could be a draw for companies wanting to target EU markets. For high-risk use cases that will require a lot of reporting to the EU authorities under the AI Act, the open-source approach will be essential for the transparency it allows, he argued.

The OpenEuroLLM project has 20 participants including companies, research institutions, and high-performance computing clusters like Finland’s Lumi. This setup could be seen as a liability with the potential for diverging priorities, but Aleph Alpha’s Jadidi argued that open-source projects often include a wide array of participants without being dragged down.

“We have all the opportunity to ensure that a high amount of contributors is not a hindrance but an opportunity,” he said.



请打开财富Plus APP
