13 April 2022

DALL·E 2, the AI that gives shape to imagination

##The AI that gives shape to imagination: DALL·E 2

We had already talked about DALL·E, the AI application developed by the OpenAI team, capable of generating images from any textual input. A few days ago, an updated version of the model, DALL·E 2, was released. The application is still being optimized but can already be tested by researchers and developers who request it, as evidenced by the many bizarre images that have appeared on Twitter in recent days, generated by DALL·E 2.

Compared to the original AI, DALL·E 2 seems to better understand the requests it receives and appears to have greater image generation capabilities, both in terms of resolution and realism and in terms of the variety of subjects and scenes. Unlike the first DALL·E, it is also expected to be released to the public, probably by the end of the year.

##What it is about

Like its predecessor, DALL·E 2 is a generative Artificial Intelligence, capable of creating something new and original, in this case, images, from a more or less related input.

It is also a multimodal AI, which integrates various intelligent functions and can learn concepts in different modes. DALL·E 2 integrates natural language understanding capabilities, useful for interpreting and decoding the textual input it receives, with image recognition, classification, and generation capabilities, necessary to create quality photos and illustrations. Technically, however, DALL·E 2 differs from its predecessor: it is no longer based on the autoregressive language model GPT-3, but rather on the CLIP language and image classification model, combined with a “Diffusion Model,” a neural network generally used to restore altered images to their original form.

Developed by OpenAI, CLIP can match an image with the most appropriate textual description; in the DALL·E 2 system, a variant of it (unCLIP) is used to interpret the textual prompt and translate it into an input and a series of requirements (“embedding”) that the Diffusion Model can decode to generate images. The neural network, accustomed to repairing corrupted but already existing images, must be guided by CLIP, pixel by pixel, to generate an original image that meets the previously established requirements by CLIP itself.

The result is an AI system capable of learning not only how to depict the individual subjects described in the textual input but also of inferring the relationship between the various subjects and then effectively and realistically representing it in the final image.

##The differences with DALL·E

DALL·E 2 can develop better, more realistic, more accurate, and more coherent images than those generated by the original DALL·E: some of the examples presented by OpenAI programmers are illustrations and photos that are indistinguishable from human artwork - and they are all high-resolution images.

The most incredible feature implemented in the new version, however, is the editing function, called “in-painting,” which allows you to perfect or modify an image by making a simple written request to the program. DALL·E 2 can modify or replace a single element or a small portion of an image in response to a simple textual prompt: it can add a corgi instead of a flower vase or modify a panda photo to make it wear a hat. It can also optimize an image to improve its grain or colors or create variations of the same image, developing images with the same subject but completely different style, pose, or composition.

##Beyond DALL·E

Essentially, DALL·E 2 could soon change the illustration and graphic design market, with its amazing ability to generate credible images in a few minutes using only simple textual instructions: it could become an essential graphic tool, both to give shape to the fantasies of users with little artistic talent and to simplify and optimize the work of graphic designers and artists. On the other hand, the development of DALL·E 2 also opens up broader horizons in the AI field: the integration of linguistic and visual abilities is indeed one of the necessary developments to one day create a more general and complex Artificial Intelligence, capable of actually understanding the world around it.

At Aidia, we develop Deep Learning models like those behind DALL·E 2, to apply them for the resolution and automation of business processes. If you want to know more, write to info@aidia.it or contact us and schedule a free consultation with us.

Source: OpenAI

Veronica Remitti

Veronica Remitti

Executive & Marketing Assistant at Aidia, graduated in Public and Political Communication Strategies, lover of nature and everything that can be narrated.

Aidia

At Aidia, we develop AI-based software solutions, NLP solutions, Big Data Analytics, and Data Science. Innovative solutions to optimize processes and streamline workflows. To learn more, contact us or send an email to info@aidia.it.

Transforming language into images with Dall-E
Article 9 of 28
Decoding Hieroglyphs with AI

Latest news