Text-to-image + text? Testing the accuracy of text-to-image AI generators when the desired image includes text.

No ratings yet.

Is the inaccuracy of generating text in image currently a limitation of text-to-image AI generators? The motivation behind testing this is the potential disruption it could have on industries such as the advertisement industry and the comic industry, where text plays an important role in images. I tested three different AI generators (Night Café, Canva, Dall-E 2) with two different input texts:

“A poster in a sports bar that says “Happy Hour from 4-6pm”
“A girl saying “Hello” in a speech bubble and a boy in a garden on a sunny day”

Night Café generated a sports bar with an illuminated wall panel. Overall, the text on the cardboard is blurry, with only one word clearly legible, which is “Hour ”. Using input 2), it generated an image depicting a girl and a boy smiling at each other in a park on a sunny day, however without any sign of “Hello ” (see two images below).

Compared to Nigh Café, the AI generator on Canva delivered more accurate output. For the first example, the words “Happy Hour” were generated in illuminated letters. Yet, instead of “from 4-6pm”, a second combination of illuminated letters shows “Happur”. For 2), the generated image is almost accurate, only “HELLO!” is shown above the boy’s head instead of the girl’s head (see two images below).

Lastly, Dall-E 2 performed similar to Night Café on 1) and the best on 2). For 1) it generated an illuminated panel with the word “Happy” along with random letters and numbers underneath. As for 2), the picture was generated according to the request (see two images below).

To conclude, the technology is not yet at a point at which text-to-image generation works accurately. Text elements may not be generated at all in the image, blurry or incorrectly but occasionally correctly. Thus AI cannot replace designers working in advertisement and comic industries. However, it can currently be seen as a “tool or collaborative assistant for creativity” (Anantrasirichai & Bull, 2022) that can increase the efficiency of the design process.

References

Anantrasirichai, N., & Bull, D. (2022). Artificial intelligence in the creative industries: a review. Artificial intelligence review, 1-68.

2 thoughts on “Text-to-image + text? Testing the accuracy of text-to-image AI generators when the desired image includes text.”

Thank you for your insightful post!
You are making a clear point about the observation! I have wondered about the issue as well.
One explanation for such an occurrence could be the vast amount of available data on which the AI is trained. Most of them are low-quality and can create unusual word combinations or mistakes within the writing.
Another explanation, and in my opinion, one that has the highest impact, is that word and image generation is built upon different AI bases.
However, I have also noticed the improvements in this sector, and most AI are at least slightly better now than they were a couple of months ago. Has anyone else noticed it?

Text-to-image + text? Testing the accuracy of text-to-image AI generators when the desired image includes text.

29

Please rate this

Related

2 thoughts on “Text-to-image + text? Testing the accuracy of text-to-image AI generators when the desired image includes text.”

Leave a Reply Cancel reply