Google Imagen
Now that object detection is almost a solved problem, work on the next frontier, text-to-image generation, began to thrive.
Google Research’s most recent work on generative models, Imagen, uses text embeddings from a large language model called T5 (similar to GPT3 and OPT175B) to encode text for image synthesis.Interestingly, the study finds that increasing the size of the language model improves performance more than increasing the size of the image diffusion model. Imagen achieves exceptional similarity between real and synthetic images (measured by the distance metric FID, Imagen achieves a score of 7.27 on the COCO dataset). Human raters confirm the performance of the model.
The paper is nicely written with a much-needed ethics discussion at the end, and full of colorful images. Apparently, Imagen does not perform as well when generating images that portray humans.
Synthetic data generation and image restoration are two common use cases of
GANs. I will post a link to one such study on medical images in the comments. Arts and crafts is obvious. I can also think of use cases for fashion and potentially personalization of products in retail. What are some other business use cases?