Google revealed, through its official blog, two new models developed by the Google Research team based on artificial intelligence techniques AI and machine learning ML, namely Imagen and Parti, which are able to create realistic images from text or convert text to an image, but both use different methods. .
How do text-to-image forms work?
Text-to-image models rely on artificial intelligence and machine learning techniques, so that people provide a text description to produce realistic or creative images that match the description as possible, so that the user can write text an apple wearing a hat, for example, or a cat sitting on a sofa, with Possibility to create more complex images based on the description in the text.
Google said that over the past years, it trained ML machine learning models on large image datasets with corresponding text descriptions, which led to the possibility of producing high-quality images with support for a wider range of descriptions, and Google indicated that other models such as DALL-E 2 were achieved from Open AI is a major breakthrough in this field.
What is Google Imagen technology?
Imagen’s text-to-image conversion model relies on previous machine learning and artificial intelligence models that are able to process words and understand the context of the sentence or link them together in one sentence, which is the basic matter of how to convert text into an image, by converting a pattern of random points into images, which are Images that Google says start at a low resolution and gradually increase over time, which has seen recent success in image and audio processing to improve image resolution for example, recolor black and white images, edit specific areas of images, uncrop images, and text-to-speech synthesis.
As for the Parti artificial intelligence model, which is also able to convert text into an image, its way of working depends on converting a set of images into a series of code inputs similar to puzzle pieces, and then creating a new image, and Google said that this approach takes advantage of the infrastructure For large language models such as PaLM which is critical for handling long and complex text prompts and producing high quality images.
Are Google’s AI models perfect?
Google mentioned some limitations to the text-to-image models it developed, as it said that it is not yet able to produce images that include a specified number of elements such as ten apples, nor can it accurately place them based on grooved spatial descriptions, such as a red ball to the left of a blue block with it. A yellow triangle, and that these models falter whenever the texts or claims are more complex, but confirmed that they are working to remedy this shortcoming.
Google also confirmed that it will continue to develop new ideas that combine the best of both Imagen and Parti models, as well as enhancing features including the ability to edit and create images interactively through text.
Dangers of text-to-image forms
Google indicated that it is aware of the risks of text-to-image models based on AI and ML techniques, which include risks related to misinformation, bias and safety, and it also said that it always holds discussions about AI practices responsible for developing and using these models safely, and that it adds Now watermarks allow others to recognize images produced by the Imagen or Parti AI models, as well as to better understand the models’ biases to ensure that all people and cultures are represented.