Since OpenAI launched its text-to-image GenAI tool, DALL-E, I have been using it for quite a while. Initially, I was amazed by how accurate and impressive the tool is for creating an image based on a short prompt, accurately depicting what I had expected in my mind. However, after using it a bit more, I started to realise that DALL-E has its limitations and is not yet ready to completely replace graphic designers or artists.
If you have been using DALL-E for a while, you have probably experienced it creating weird texts in the image. As shown below in image 1, (prompt: Create an image of a tree in autumn in the Dutch city Utrecht, there are some stores with storefront names in the background), the store names are inaccurate, while the image itself looks amazing and of high quality.
Why can DALL-E create amazing visual images, but can’t produce normal text on these images while this seems so easy?
This is due to several technical issues. While the model is effective at understanding and generating visual elements based on prompts, it often lacks the capabilities to distinguish between visual content and written content. This limitation is rooted in the training technique of the models, when text is rendered as part of the image it becomes a visual pattern instead of a linguistic one. OpenAI has said that the next version, DALL-E 4, will have better results in distinguishing linguistic vs visual elements.
Another important issue to address, is the biased results from DALL-E. When asking to create an image of a CEO at work with an assistant, see image 2 (prompt: Create an immage of a CEO at work with an assistant), it will show a image of a man as a CEO and a woman as an assistant. This is because the model is trained on data from databases, where CEO’s are often represented as a man instead of a woman. Leading to biased and discriminatory results, ultimately reinforcing outdated gender stereotypes. The same counts for culturally insensitive and inappropriate results, because the model is not adapted to the cultural awareness of humans.
To conclude, these are just two of DALL-E’s limitations, while there are many more that I haven’t discussed. It is clear that DALL-E is not perfect and needs to improve their model before it can replace graphic designers or artists. Nonetheless, the potential for the future is immense, and for now, it is incredibly fun to experiment with.
Thanks for reading, if you have an opinion about this topic, please leave a comment below.
(Disclaimer: this blog is written based on my personal experience)