创作和编辑仿真数字图片将变得更容易。
旧金山人工智能公司OpenAI宣布,其开发的一款人工智能系统可以根据对物品或场景的描述,自动生成高度逼真的图片。该公司与微软(Microsoft)关系密切。通过该系统还可以使用简单的工具编辑图片和修改文字,不需要用户精通Photoshop或数字艺术技能。
参与该项目的OpenAI研究员亚历克斯·尼克尔表示:“我们希望这类工具能让所有人都有能力创作自己想要的图片。”他表示,该工具对于产品设计师、杂志封面设计师和艺术家都有很大帮助,可以用来启发灵感和进行头脑风暴,或者直接用于创作最终作品。他还表示,电脑游戏公司也可以使用该软件生成场景和角色,尽管它目前只能生成静态图片,无法生成动画或视频。
这款软件更容易被用于生成种族主义梗图或者创作虚假图片,作为宣传材料或传播虚假信息,甚至被用于制作色情图片,因此OpenAI表示已经采取措施限制该软件在这方面的能力,首先是从人工智能培训数据中删除这些图片,并对人工智能生成的图片进行基于规则的筛查和人工内容审查。
OpenAI也在非常慎重地控制新人工智能的发布,该公司表示该软件目前只是一个研究项目,并不是一款商业产品。该公司目前仅向经过精心筛选的试用版测试人员分享该软件。但OpenAI之前基于自然语言处理开发的突破性技术,在约18月内便被应用于商业产品。
OpenAI开发的最新软件名为DALL-E 2,是其2021年初发布的DALL-E系统的升级版。(首字母缩写较为复杂,它会让人们想起皮克斯(Pixar)著名动画电影中的机器人瓦力(WALL-E),然后用超现实主义艺术家萨尔瓦多·达利的名字玩了一个文字游戏,而该系统生成的图片确实是超现实的,因此这个名字很有意义。)
初版DALL-E只能生成卡通图片,通常使用简单背景。新版DALL-E 2可以在复杂背景下生成照片品质的高分辨率图像,有景深效果、真实的光线、阴影和倒影等。
虽然之前使用计算机渲染图像也可以达到这种逼真的效果,但这需要具备高深的艺术技能。而通过这款软件,用户只需要输入命令,比如“戴贝雷帽和穿黑色高领衫的柴犬”,然后DALL-E 2就会生成几十张符合这个主题的逼真图片。
DALL-E 2让图片编辑变得更容易。用户可以用方框圈出图片中希望修改的部分,然后用自然语言指令说明其希望进行哪些修改。例如,你可以框住柴犬的贝雷帽,然后输入命令“将贝雷帽变成红色”,它就会自动变色,但不会改动图片的其他部分。此外,DALL-E 2还可以生成同一张图片的不同风格,用户同样只需要输入纯文本命令即可。
OpenAI所做的测试显示,如果用户在物品上添加的文本标签与实际不符,这种行为不太容易骗过DALL-E 2使用的字幕和图片分类算法。例如,之前将文字与图片关联的算法如果在接受训练时被展示的是一张苹果的图片和写着“披萨”的文字标签,那么它可能会将图片误认为是披萨。而DALLE-2使用的算法不会犯同样的错误。它依旧会将图片中的物品识别为苹果。
OpenAI联合创始人兼首席科学家伊利亚·萨茨克尔表示,DALL-E 2是OpenAI实现创建通用人工智能(AGI)这一目标的重要一步。通用人工智能软件在多类任务中可以有不亚于人类甚至胜过人类的表现。萨茨克尔表示,通用人工智能需要处理“多模态”概念理解,能够将一个词与图片或一组图片相互关联。他表示DALL-E 2就是创造具备这种理解能力的人工智能的一次尝试。
OpenAI曾尝试通过自然语言处理创造通用人工智能。该公司开发的一款商业产品是支持其他公司访问GPT-3的编程界面。GPT-3是一个庞大的自然语言处理系统,可以撰写大段小说文字,还能执行翻译、汇总等许多自然语言任务。
当然,DALL-E 2并不完美。该系统有时候无法生成复杂场景下的细节。某些光影效果可能会有偏差,或者模糊两个物品之间本应泾渭分明的边缘。另外,与其他多模态人工智能软件相比,它在理解“绑定属性”方面并不擅长。如果你发出指令“在蓝色立方体上面的红色立方体”,它有时候会错误生成红色立方体位于蓝色立方体下方的图片。(财富中文网)
译者:刘进龙
审校:汪皓
创作和编辑仿真数字图片将变得更容易。
旧金山人工智能公司OpenAI宣布,其开发的一款人工智能系统可以根据对物品或场景的描述,自动生成高度逼真的图片。该公司与微软(Microsoft)关系密切。通过该系统还可以使用简单的工具编辑图片和修改文字,不需要用户精通Photoshop或数字艺术技能。
参与该项目的OpenAI研究员亚历克斯·尼克尔表示:“我们希望这类工具能让所有人都有能力创作自己想要的图片。”他表示,该工具对于产品设计师、杂志封面设计师和艺术家都有很大帮助,可以用来启发灵感和进行头脑风暴,或者直接用于创作最终作品。他还表示,电脑游戏公司也可以使用该软件生成场景和角色,尽管它目前只能生成静态图片,无法生成动画或视频。
这款软件更容易被用于生成种族主义梗图或者创作虚假图片,作为宣传材料或传播虚假信息,甚至被用于制作色情图片,因此OpenAI表示已经采取措施限制该软件在这方面的能力,首先是从人工智能培训数据中删除这些图片,并对人工智能生成的图片进行基于规则的筛查和人工内容审查。
OpenAI也在非常慎重地控制新人工智能的发布,该公司表示该软件目前只是一个研究项目,并不是一款商业产品。该公司目前仅向经过精心筛选的试用版测试人员分享该软件。但OpenAI之前基于自然语言处理开发的突破性技术,在约18月内便被应用于商业产品。
OpenAI开发的最新软件名为DALL-E 2,是其2021年初发布的DALL-E系统的升级版。(首字母缩写较为复杂,它会让人们想起皮克斯(Pixar)著名动画电影中的机器人瓦力(WALL-E),然后用超现实主义艺术家萨尔瓦多·达利的名字玩了一个文字游戏,而该系统生成的图片确实是超现实的,因此这个名字很有意义。)
初版DALL-E只能生成卡通图片,通常使用简单背景。新版DALL-E 2可以在复杂背景下生成照片品质的高分辨率图像,有景深效果、真实的光线、阴影和倒影等。
虽然之前使用计算机渲染图像也可以达到这种逼真的效果,但这需要具备高深的艺术技能。而通过这款软件,用户只需要输入命令,比如“戴贝雷帽和穿黑色高领衫的柴犬”,然后DALL-E 2就会生成几十张符合这个主题的逼真图片。
DALL-E 2让图片编辑变得更容易。用户可以用方框圈出图片中希望修改的部分,然后用自然语言指令说明其希望进行哪些修改。例如,你可以框住柴犬的贝雷帽,然后输入命令“将贝雷帽变成红色”,它就会自动变色,但不会改动图片的其他部分。此外,DALL-E 2还可以生成同一张图片的不同风格,用户同样只需要输入纯文本命令即可。
OpenAI所做的测试显示,如果用户在物品上添加的文本标签与实际不符,这种行为不太容易骗过DALL-E 2使用的字幕和图片分类算法。例如,之前将文字与图片关联的算法如果在接受训练时被展示的是一张苹果的图片和写着“披萨”的文字标签,那么它可能会将图片误认为是披萨。而DALLE-2使用的算法不会犯同样的错误。它依旧会将图片中的物品识别为苹果。
OpenAI联合创始人兼首席科学家伊利亚·萨茨克尔表示,DALL-E 2是OpenAI实现创建通用人工智能(AGI)这一目标的重要一步。通用人工智能软件在多类任务中可以有不亚于人类甚至胜过人类的表现。萨茨克尔表示,通用人工智能需要处理“多模态”概念理解,能够将一个词与图片或一组图片相互关联。他表示DALL-E 2就是创造具备这种理解能力的人工智能的一次尝试。
OpenAI曾尝试通过自然语言处理创造通用人工智能。该公司开发的一款商业产品是支持其他公司访问GPT-3的编程界面。GPT-3是一个庞大的自然语言处理系统,可以撰写大段小说文字,还能执行翻译、汇总等许多自然语言任务。
当然,DALL-E 2并不完美。该系统有时候无法生成复杂场景下的细节。某些光影效果可能会有偏差,或者模糊两个物品之间本应泾渭分明的边缘。另外,与其他多模态人工智能软件相比,它在理解“绑定属性”方面并不擅长。如果你发出指令“在蓝色立方体上面的红色立方体”,它有时候会错误生成红色立方体位于蓝色立方体下方的图片。(财富中文网)
译者:刘进龙
审校:汪皓
The creation and editing of photorealistic digital images is about to get much easier.
OpenAI, the San Francisco artificial intelligence company that is closely affiliated with Microsoft, just announced it has created an A.I. system that can take a description of an object or scene and automatically generate a highly realistic image depicting it. The system also allows a person to easily edit the image with simple tools and text modifications, rather than requiring traditional Photoshop or digital art skills.
“We hope tools like this democratize the ability for people to create whatever they want,” Alex Nichol, one of the OpenAI researchers who worked on the project, said. He said the tool could be useful for product designers, magazine cover designers, and artists—either to use for inspiration and brainstorming, or to actually create finished works. He also said computer game companies might want to use it to generate scenes and characters—although the software currently generates still images, not animation or videos.
Because the software could be also used to more easily generate racist memes or create fake images to be used in propaganda or disinformation, or, for that matter, to create pornography, OpenAI says it has taken steps to limit the software’s capabilities in this area, first by trying to remove such images from the A.I.’s training data, but also by applying rule-based filters and human content reviews to the images the A.I. generates.
OpenAI is also trying to carefully control the release of the new A.I., which it describes as currently just a research project and not a commercial product. It is sharing the software only with what it describes as a select and screened group of beta testers. But in the past, OpenAI’s breakthroughs based on natural-language processing have often found their way into commercial products within about 18 months.
The software OpenAI has created is called DALL-E 2, and it is an updated version of a system that OpenAI debuted in early 2021, simply called DALL-E. (The acronym is complicated, but it is meant to evoke a mashup of WALL-E, the animated robot of Pixar movie fame, and a play on words for Dali, as in Salvador, the surrealist artist, which makes sense given the surreal nature of the images the system can generate.)
The original DALL-E could render images only in a cartoonish manner, often against a plain background. The new DALL-E 2 can generate photo-quality high-resolution images, complete with complex backgrounds, depth-of-field effects, realistic shadows, shading, and reflections.
While these realistic renderings have been possible with computer-rendered images previously, creating them required some serious artistic skill. Here, all a user has to do is type the command, “a shiba inu wearing a beret and a black turtleneck,” and then DALL-E 2 spits out dozens of photorealistic variations on that theme.
DALL-E 2 also makes editing an image easy. A user can simply place a box around the part of the image they want to modify and specify the modification they want to make in natural-language instructions. You could, for instance, put a box around the Shiba Inu’s beret and type “make the beret red,” and the beret would be transformed without altering the rest of the image. In addition, DALL-E 2 can produce the same image in a wide range of styles, which the user can also specify in plain text.
The captioning and image classification algorithms that underpin DALL-E 2 are, according to tests OpenAI performed, less susceptible to attempts to trick it in which an object is labeled with text that is different from what the object actually is. For instance, previous algorithms that were trained to associate text and images, when shown an apple with a printed label saying “pizza” attached to it, would mistakenly label the image as being a pizza. The system that now makes up part of DALLE-2 does not make the same mistake. It still identifies the image as being of an apple.
Ilya Sutskever, OpenAI’s cofounder and chief scientist, said that DALL-E 2 was an important step toward OpenAI’s goal of trying to create artificial general intelligence (AGI), a single piece of A.I. software that can achieve human-level or better than human-level performance across a wide range of disparate tasks. AGI would need to possess “multimodal” conceptual understanding—being able to associate a word with an image or set of images and vice versa, Sutskever said. And DALL-E 2 is an attempt to create an A.I. with this sort of understanding, he said.
In the past, OpenAI has tried to pursue AGI through natural-language processing. The company’s one commercial product is a programming interface that lets other businesses access GPT-3, a massive natural-language processing system that can compose long passages of novel text, as well as perform a number of other natural-language tasks, from translation to summarization.
DALL-E 2 is far from perfect though. The system sometimes cannot render details in complex scenes. It can get some of the lighting and shadow effects slightly wrong or merge the borders of two objects that should be distinct. It is also less adept than some other multimodal A.I. software at understanding “binding attributes.” Give it the instruction, “a red cube on top of a blue cube,” and it will sometimes offer variations in which the red cube appears below a blue cube.