{"id":120,"date":"2022-06-06T21:20:29","date_gmt":"2022-06-06T21:20:29","guid":{"rendered":"https:\/\/ozer.gt\/log\/?p=120"},"modified":"2024-01-27T06:57:53","modified_gmt":"2024-01-27T06:57:53","slug":"google-imagen","status":"publish","type":"post","link":"https:\/\/ozer.gt\/log\/2022\/06\/06\/google-imagen\/","title":{"rendered":"Google Imagen"},"content":{"rendered":"<p>Now that object detection is almost a solved problem, work on the next frontier, text-to-image generation, began to thrive. <span aria-hidden=\"true\">#<\/span>Google Research&#8217;s most recent work on generative models, <span aria-hidden=\"true\">#<\/span>Imagen, uses text embeddings from a large language model called <span aria-hidden=\"true\">#<\/span>T5 (similar to <span aria-hidden=\"true\">#<\/span>GPT3 and <span aria-hidden=\"true\">#<\/span>OPT175B) to encode text for image synthesis.<\/p>\n<p>Interestingly, the study finds that increasing the size of the language model improves performance more than increasing the size of the image diffusion model. Imagen achieves exceptional similarity between real and synthetic images (measured by the distance metric FID, Imagen achieves a score of 7.27 on the COCO dataset). Human raters confirm the performance of the model.<\/p>\n<p>The paper is nicely written with a much-needed ethics discussion at the end, and full of colorful images. Apparently, Imagen does not perform as well when generating images that portray humans.<\/p>\n<p>Synthetic data generation and image restoration are two common use cases of <span aria-hidden=\"true\">#<\/span>GANs. I will post a link to one such study on medical images in the comments. <span aria-hidden=\"true\">#<\/span>Arts and <span aria-hidden=\"true\">#<\/span>crafts is obvious. I can also think of use cases for <span aria-hidden=\"true\">#<\/span>fashion and potentially <span aria-hidden=\"true\">#<\/span>personalization of products in <span aria-hidden=\"true\">#<\/span>retail. What are some other business use cases?<\/p>\n<p><a href=\"https:\/\/imagen.research.google\/\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Now that object detection is almost a solved problem, work on the next frontier, text-to-image generation, began to thrive. #Google Research&#8217;s most recent work on generative models, #Imagen, uses text embeddings from a large language model called #T5 (similar to #GPT3 and #OPT175B) to encode text for image synthesis. Interestingly, the study finds that increasing [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"cybocfi_hide_featured_image":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-120","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts\/120","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/comments?post=120"}],"version-history":[{"count":2,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts\/120\/revisions"}],"predecessor-version":[{"id":124,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts\/120\/revisions\/124"}],"wp:attachment":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/media?parent=120"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/categories?post=120"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/tags?post=120"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}