《纽约时报》正考虑起诉OpenAI

Irina Ivanova

2023-08-21

越来越多的作家和艺术家认为数据爬取不合法，因此，针对OpenAI和其他生成式人工智能创建者的诉讼（指控它们侵犯版权）不断增加。

文本设置

小号

默认

大号

Plus(0条)

OpenAI首席执行官萨姆·奥尔特曼。图片来源：TOMOHIRO OHSUMI—GETTY IMAGES

超人气 ChatGPT 背后的初创公司 OpenAI 面临的法律纠纷越来越多。据美国国家公共广播电台（NPR）报道，《纽约时报》（New York Times）正在考虑起诉OpenAI。此前双方试图达成协议，OpenAI将获得新闻内容授权来训练其算法，但未能取得进展。

如果这一诉讼成为现实，这就将是迄今为止最引人注目的迫使ChatGPT（这款工具风靡全球）就范的尝试。如果诉讼取得成功，OpenAI就将不得不花费巨资重新训练ChatGPT，因为这将删除大部分用于训练大型语言模型的语料。

值得注意的是，据网站Semafor报道，《纽约时报》曾经是联合游说政府建立人工智能监管的团体的一员，直到它突然退出。《纽约时报》也不是唯一一家因为OpenAI非法爬取训练数据而提起诉讼的机构。喜剧演员萨拉·西尔弗曼、作家保罗·特伦布莱、莫娜·阿瓦德和克里斯托弗·戈尔登在今年7月起诉了OpenAI，指控该公司在训练ChatGPT时剽窃了他们的作品，而且剽窃规模达到“工业级水平”。

今年1月，三位商业艺术家起诉了流行的图像创建引擎Midjourney的创建者，指控其窃取他们的作品进行山寨，使艺术家们无法靠自己的作品谋生。艺术家们的律师称这项技术是“寄生虫，如果任其泛滥，就将对艺术家造成无法弥补的伤害”。图片授权服务公司Getty已经起诉了Stability AI公司，指控其非法复制Getty拥有的1,200万张图片，从而与Getty形成竞争。与此同时，8月17日的早些时候，美联社（AP）为员工制定了一套人工智能标准，鼓励他们使用人工智能进行实验，但禁止他们使用人工智能生成任何在网站上发布的内容或图像。

就连2018年离开OpenAI董事会的埃隆·马斯克也在今年7月声称，人工智能公司在推特（Twitter）上进行的“数据爬取达到极端水平”。“几乎所有从事人工智能的公司，从初创公司到世界上最大的公司，都在爬取大量数据。为了帮助部分人工智能初创公司实现离谱估值，不得不紧急上线大量服务器，这是相当令人恼火的。”

据美国国家公共广播电台报道，《纽约时报》担心，OpenAI会“根据该报员工的原始报道和写作风格，创建回答问题的文本”，从而成为其报道的直接竞争对手。

《纽约时报》和OpenAI都没有立即回复置评请求。然而，《纽约时报》有充分的理由担心来自ChatGPT的竞争。依赖网络流量的小企业已经发现网络流量被一项更基本的技术摧毁了——谷歌（Google）的搜索框，它把输入问题的答案以段落的形式显示在搜索结果的顶部。

行业细分网站CelebrityNetWorth（作为对名人财务交易感兴趣的人们的信息来源）曾经做得风生水起，但自从谷歌开始在搜索框中显示名人的净资产后，CelebrityNetWorth的流量骤减了三分之二，该网站不得不裁掉一半的员工。其创始人告诉网站The Outline。

美国西北大学（Northwestern University）梅迪尔学院（Medill School）的数字媒体战略奈特高级教授杰里米·吉尔伯特对《财富》杂志表示：“如果诉讼成为现实，这场诉讼就将涉及收集信息的价值，以及谁能够使用这些信息（为客户）。”

搜索引擎必应[Bing，其所有者微软（Microsoft）向OpenAI投资了数十亿美元]目前正在使用ChatGPT来增强其搜索功能。如果有人向必应提问，该搜索引擎就会根据《纽约时报》的报道，立即给出冗长而详尽的答案，这样用户就无需访问《纽约时报》的网站了（同时也减少了《纽约时报》的收入）。

吉尔伯特说：“出版商最看重新闻带来的直接流量。”但像ChatGPT这样的大型语言模型“根本不可能引导你到相关新闻网站”。

他表示：“如果[受众]无需点击《纽约时报》就可以获得所需要的一切，《纽约时报》如何为其报道提供资金？即使这样做更能够满足用户的需求，但这从根本上来说是难以维持的。”

在IAC为首的一些媒体机构组成了一个联盟，向OpenAI施压，要求其支付“数十亿美元” 的费用，以使用它们的作品作为训练语料。

OpenAI正在复制所有内容——但这合法吗？

众所周知，OpenAI是根据从公共网络上爬取的大量数据（小说、网络论坛、对话、新闻报道、照片和插图）进行训练的。

目前尚不清楚的是，这种数据爬取是否合法。越来越多的作家和艺术家表示这不合法，因此，针对OpenAI和其他生成式人工智能创建者的诉讼（指控它们侵犯版权）不断增加。

就连OpenAI的用户一想到自己和机器人的对话成为训练语料，也感到毛骨悚然：为了回应用户的强烈反对，OpenAI在今年春天修改了条款，明确指出输入的提示不会被用于训练机器人。

一群律师和媒体学者最近撰文指出，生成式人工智能“是版权法的雷区”。在这些案件中，法官如何看待该技术的运行原理将是决定性因素。

如果法官认为人工智能生成的内容是创作出来的新内容，或者是对原有作品进行了重大修改，那么他们可能就会认为人工智能对受版权保护作品的使用是合理的。

另一方面，如果法官认为人工智能只是复制和机械重复他人的作品，那么他们可能就会认定人工智能非法使用受版权保护的作品，并迫使OpenAI销毁其数据集中所有这些作品的副本。

无论法院如何裁决，《纽约时报》似乎都将在人工智能这块蛋糕上分一杯羹。

今年春天，《纽约时报》的首席执行官梅雷迪思·科皮特·莱维恩在戛纳狮子国际创意节（Cannes Lions）的活动上说：“对于已经被用于训练模型的内容，以及将继续被用于训练模型的内容，必须进行公平的价值交换。”（财富中文网）

译者：中慧言-王芳

吉尔伯特说：“出版商最看重新闻带来的直接流量。”但像ChatGPT这样的大型语言模型“根本不可能引导你到相关新闻网站”。

在IAC为首的一些媒体机构组成了一个联盟，向OpenAI施压，要求其支付“数十亿美元” 的费用，以使用它们的作品作为训练语料。

OpenAI正在复制所有内容——但这合法吗？

众所周知，OpenAI是根据从公共网络上爬取的大量数据（小说、网络论坛、对话、新闻报道、照片和插图）进行训练的。

一群律师和媒体学者最近撰文指出，生成式人工智能“是版权法的雷区”。在这些案件中，法官如何看待该技术的运行原理将是决定性因素。

无论法院如何裁决，《纽约时报》似乎都将在人工智能这块蛋糕上分一杯羹。

译者：中慧言-王芳

The legal woes are piling up for OpenAI, the startup behind the ultra-popular ChatGPT. NPR reports that the New York Times is considering suing OpenAI after attempts to reach a deal in which OpenAI would license news content to train its algorithms failed to progress.

If the lawsuit materializes, it would be the highest-profile attempt yet to bring to heel ChatGPT, a tool whose hype has taken the world by storm. And a successful lawsuit could even go further than that, forcing OpenAI to retrain ChatGPT at great expense, as it would essentially remove much of the language on which the large language model has been trained.

Of note is that the Times was part of a group collectively lobbying for regulations on AI, until it suddenly removed itself, according to Semafor. The Times’ lawsuit also is not alone in arguing that OpenAI has illegally scraped training data. Comedian Sarah Silverman and authors Paul Tremblay, Mona Awad, and Christopher Golden, sued OpenAI in July, alleging the company committed “indus¬trial-strength” plagiarism when it trained ChatGPT on their work.

In January, a trio of commercial artists sued the creators of the popular image-creating engine Midjourney, accusing it of stealing their work to create knock-offs, preventing artists from making a living off their work. The artists’ lawyers called the technology “a par¬a¬site that, if allowed to pro¬lif¬er¬ate, will cause irrepara¬ble harm to artists.” And Getty, the image-licensing service, has sued Stability AI, accusing it of illegally copying 12 million Getty-owned images in a bid to create a competing service. Meanwhile, earlier on August 17, the AP came up with a set of AI standards for staff that encourage them to experiment with it but forbidding them from using it to create any content or images that would be published.

Even Elon Musk, who famously left OpenAI’s board in 2018, claimed in July of this year that “extreme levels of data scraping” were happening on Twitter at the hands of AI companies. “Almost every company doing A.I., from startups to some of the biggest corporations on earth, was scraping vast amounts of data. It is rather galling to have to bring large numbers of servers online on an emergency basis just to facilitate some A.I. startup’s outrageous valuation.”

The Times’ is concerned, according to NPR, is that OpenAI would create a direct competitor to its reporting “by creating text that answers questions based on the original reporting and writing of the paper’s staff.”

Neither the Times nor OpenAI immediately replied to a request for comment. However, the Times has a good reason to fear competition from ChatGPT. Small businesses that rely on web traffic have seen it destroyed by a more basic piece of technology—Google’s search box, which presents the answer to a typed question as a paragraph at the top of search results.

The niche site CelebrityNetWorth used to do decent business as a source for people curious about celebs’ financial dealings, but after Google started presenting celebrities’ net worth in its search box, traffic to CelebrityNetWorth plunged by two-thirds, and the site had to lay off half its staff, its founder told The Outline.

“If it happens, this lawsuit will be about the value of gathering information and who gets to use it for their customers,” Jeremy Gilbert, Knight professor in digital media strategy at Northwestern University’s Medill School, told Fortune.

The search engine Bing (whose owner, Microsoft, has invested billions in OpenAI) is now using ChatGPT to power its searches. If a person were to ask Bing a question, the search engine could instantly produce a long, detailed answer based on New York Times reporting, eliminating the person’s need to visit the Times’ website (and cheating the paper of revenue).

“Publishers feel most comfortable with direct traffic to news,” Gilbert said. But a large-language model like ChatGPT’s “may not send you to the news website at all.”

“If [audiences] get everything they need without clicking through to the New York Times, how does the New York Times fund its reporting? Even if that’s much more satisfying for the consumer, it’s fundamentally untenable,” he said.

A group of media outlets, led by IAC, have formed a coalition to pressure OpenAI into paying them “billions” for the use of their work as training material.

OpenAI is copying everything — but is it legal?

It’s no secret that OpenAI has been trained on a vast sea of data—novels, web forums, conversations, news articles, photos, and illustrations—scraped from the public web.

What’s not clear yet is whether this scraping is legal. And a growing number of writers and artists say it isn’t, with lawsuits mounting against OpenAI and other generative-A.I. creators accusing them of copyright infringement.

Even OpenAI’s users are creeped out by the thought of being training material: In response to user backlash, OpenAI this spring changed its terms to clarify that prompts submitted to ChatGPT would not be used to train the bot.

Generative A.I. “is a minefield for copyright law,” a group of lawyers and media scholars recently wrote. The courts’ views of what, exactly, the technology does will be a key deciding factor in these cases.

If judges believe that the materials A.I. spits out are new creations, or that they significantly transform the works they’re based on, they’re likely to see its treatment of copyrighted works as fair use.

If, on the other hand, they believe the A.I. is simply copying and regurgitating others’ works, they could find its use illegal, and force OpenAI to destroy all copies of those works in its dataset.

Regardless of how the courts rule, the Times seems set to get its share of the A.I. pie.

Speaking at a Cannes Lions event this spring, Times CEO Meredith Kopit Levien said, “There must be fair value exchange for the content that’s already been used, and the content that will continue to be used, to train models.”

财富中文网所刊载内容之知识产权为财富媒体知识产权有限公司及/或相关权利人专属所有或持有。未经许可，禁止进行转载、摘编、复制及建立镜像等任何使用。

0条Plus

精彩评论

撰写或查看更多评论

请打开财富Plus APP

前往打开

热读文章

关注我们

《纽约时报》正考虑起诉OpenAI

撰写或查看更多评论