Is it difficult for GenAI to generate picture with text on it?

7

October

2024

5/5 (1)

Pictures with text are everywhere, in comic books, advertisement and scientific articles, enhancing both the clarity of the images and the vividness of the text. However, it appears that GenAI struggle to create a picture with readable, correct and complete text on it. I have used several GenAI to evaluate their performance on that.

In first step, I put in “Generates an image of a monster on the Erasmus Bridge” as the basic context, in which ChatGPT, AI Image Generator and ChatGLM all perform pretty well in combining the Erasmusbrug in reality with monster figures. In the next step, I try to add text on the scene by requesting “the monster is analyzing strategies to make the traffic on the bridge more reasonable and faster, expressed in words”. Some bubbles with unrecognizable words appear and further instructions like “making the text clearer and more specific” also fail to make it clear. Lack of specific input of text may explain for that, so I state the specific word the text should contain. The results remain unsatisfying, as ChatGPT and ChatGLM generate something like alien writing.

Output of ChatGPT (DALL·E 3)

Given the excellence of AI in generating text and creative pictures respectively, what would the obstacles for it to create informative pictures with text? A possible reason is that when GenAI specialized in image is trained, it skipped most of the pictures with text in avoidance of copyright infringement, making text image insufficient (Growcoot, 2024). Another explanation says that AI has limited ability to understand your words, when the description of the image and the text needed on it appears in a single sentence, it would be confused and make mistakes. More fundamentally, generating text images can be difficult in nature. For a text-to-image model, text symbols are just more precise combinations of lines and shapes, but not meaningful words. As text comes in so many different styles, the model often won’t understand how to effectively reproduce text, and minor imperfections in text are noticeable (Mirjalili, 2023).

Leading AI technology companies have been working on this problem by training the GenAI model in a more advanced and specialized way. Research suggests that adding more parameters when models are trained can dramatically improve text rendering (Growcoot, 2024). Stability AI has released Stable Diffusion 3 in 2024, with diffusion transformer architecture combined and the ability of text writing on pictures said to be improved. While the effect of advanced model remains a mystery, progress is being made. One day soon, GenAI might actually make ‘a picture worth a thousand readable words.’

Reference

Growcoot, M. (2024, March 6). Why AI image generators struggle to get text right. PetaPixel. https://petapixel.com/2024/03/06/why-ai-image-generators-struggle-to-get-text-right/

Mirjalili, S. (2023, July 5). If AI image generators are so smart, why do they struggle to write and count? TechXplore. https://techxplore.com/news/2023-07-ai-image-generators-smart-struggle.html

Silberling, A. (2024, March 21). Why is AI so bad at spelling? Because image generators aren’t actually reading text. TechCrunch. https://techcrunch.com/2024/03/21/why-is-ai-so-bad-at-spelling/

Appendix

1.Input in Gen AI

1) Generates an image of a monster on the Erasmusbrug

2) The monster is analyzing strategies to make the traffic on the bridge more reasonable and faster, expressed in words. Make the text clearer and more specific

3) The text on the picture contains the following: Optimize traffic lights with algorithm, Divide lanes for different speeds, Set up tram tracks in different time periods

2.Output of ChatGPT (DALL·E 3)

3. Output of AI Image Generator (DeepAI)

4.Output of ChatGLM

Please rate this

What’s Next for Digital Platform Growth: The Example of Meituan in China

16

September

2024

5/5 (1)

Meituan, founded in 2010 in China, is a two-sided digital company serves as a food delivery platform. Meituan has developed a user-friendly application and efficient delivery system that connects customers with a vast restaurant network. Consumers place orders on their mobile phones, and delivery personnel pick them up from the restaurant and deliver them to consumers’ homes. The positive network effect has helped Meituan quickly gain strong influence in China. At the end of 2020, the number of active merchants and annual transaction users on Meituan have increased to 6.8 million and 510 million respectively1.

Meituan has defeated its competitors by adopting an envelopment strategy to expand its services into multiple areas such as travel booking, ride hailing, and e-commerce. Recently in 2023, Meituan launched its supermarket brand ‘Xiaoxiang Supermarket’, marking an important step into the physical retail sector. Meituan has several advantages in entering the supermarket retail industry. Firstly, Meituan has a huge customer base who is accustomed to the convenience of online shopping and delivery. They have an increasing demand for purchasing groceries and daily necessities online. Secondly, the large amount of data and advanced algorithms accumulated by the company make market analysis more accurate. By combining digital platforms with physical stores, Meituan aims to provide a seamless shopping experience for consumers.

Image: Employees of Xiaoxiang Supermarket are sorting products2

Meituan’s strategic initiatives have raised questions about the future growth trajectory of digital platforms. Is entering physical retail a sign of digital platforms reaching their growth limit and prompting companies to explore the real economy? Or is it a strategic diversification aimed at increasing customer engagement and loyalty? Both of these factors may exist. Entering the retail sector is a move to further achieve supply-side economies of scale, while the comprehensive functionality of the platform may make Meituan “too big to fail”, increasing consumers’ switching costs and loyalty.

Actually, Meituan is not alone in taking strategic move towards the retail sector. Amazon has acquired Whole Foods and run the Amazon Go stores successfully, which demonstrates that digital giants can thrive in the supermarket sector. On the contrary, the difficulties encountered by Google in entering the smartphone market with its own Pixel phone indicate that not all digital enterprises’ expansion into the real economy will be smooth.

With the continuous development of digital platforms, the boundary between the digital and physical domains has become blurred. In the future, we may see more digital native enterprises like Meituan exploring innovative ways to combine their digital advantages with the real economy.

  1. https://finance.sina.com.cn/tech/2021-03-28/doc-ikkntiam9834301.shtml ↩︎
  2. https://tech.ifeng.com/c/8VBjfsYImQD ↩︎

Please rate this