One of the most widespread applications of AI concerns language and the conversion of images into language: with a photo of the document to be interpreted, the AI program can analyze individual signs or symbols and provide a “transliteration,” which can be literal, oral, or translated. It is a useful application for converting old files, helping with the translation and interpretation of other languages, even ancient ones, and facilitating access to written documents for blind people. A few months ago, OpenAI presented an application in the opposite direction: a new AI model is capable of creating original images from text.
The algorithm was named DALL·E, a portmanteau of Salvador Dalì’s name and that of the famous Pixar robot, and it was trained to generate images from any textual input processed with natural language. It can combine very distant concepts to create believable, albeit sometimes bizarre, images. The model generates several images simultaneously, each different from the other, and creates them in the style requested by the input phrase: it can generate photo-like images with realistic subjects or illustrations of improbable scenes. Some of its funniest creations are the emoji of a turnip in a tutu walking a dog and the illustration of a chimera half giraffe and half turtle.
OpenAI’s new model is based on their autoregressive language model, GPT-3. GPT-3 is a natural language processing (NLP) system that relies on a neural network architecture (Deep Learning) to produce texts that mimic human language use. Distinguishing a text developed by GPT-3 from one written by a real person is mostly impossible. DALL·E uses the same model but with fewer parameters (12 billion instead of GPT-3’s 175 billion) and different training and output instructions. The entire training was done using text-image pairs extracted from the Internet, and the output required was pixels instead of words.
As DALL·E clearly demonstrates, the application of NLP systems now goes beyond simple text comprehension and offers many opportunities. Specific NLP models can understand the tone and mood of the author of a sentence, manage dialogues to provide customer support, generate text summaries, automatically translate sentences from one language to another, autonomously produce analyses and reports, and classify documents into predefined categories.
Marketing Specialist at AIDIA, graduated in International Studies in Florence, passionate about history, economics, and the bizarre things of the world.
At Aidia, we develop AI-based software solutions, NLP solutions, Big Data Analytics, and Data Science. Innovative solutions to optimize processes and streamline workflows. To learn more, contact us or send an email to info@aidia.it.