How to get the most out of Text-to-Image generation AI through prompt-engineering

19

October

2023

No ratings yet.

Text to image AI is a groundbreaking form of Artificial Intelligence (AI) model that is able to generate a wide variety of images from natural language descriptions, or “prompts”, which are provided by users. Text to image generation is quickly becoming one of the most useful forms of AI, as demonstrated by the degree to which the technology’s ability to disrupt industries such as graphical design, content creation and the art/creative industry (Jones, 2022). Having used multiple prominent text to image generation models, I have noticed that the quality and usefulness of the overall outcome is often highly dependent on the quality of the prompt I provide – enter: Prompt Engineering.  

Prompts are brief texts that instruct the AI on what the user wants to create. For text to image models, such as OpenAI’s DALL-E, details used to make these descriptions often include objects, shapes, colors, sizes, positions, actions and backgrounds. Prompt engineering, otherwise referred to as prompt design, is the practice of optimizing AI models for certain tasks by meticulously crafting the inputs given through adding certain phrases or specific instructions (Lawton, 2023).

In my experience, using text to image generation AI can be less intuitive to users starting out with the technology, as compared to pure language-based models such as ChatGPT. As a beginner, I noticed that I needed to use more specific language, over multiple iterations to even get close to the outcome I desired. Therefore, I have put together a list of tips aimed at helping beginners get the most out of these kinds of models through prompt engineering.

For the purpose of this blogpost, I will demonstrate the different outcomes from varying prompts using Bing Image Generator (powered by DALL-E 3) to generate an image of “a dog”. This is the result of the first prompt:

A first improvement would be to begin using keywords or phrases that are meaningful and relevant to the AI. For example, one could improve results by writing something like “a labrador”, yielding the following result:

Similarly, one should use modifiers and qualifiers, which are adjectives, measurements, colors or numbers that modify/qualify the main subject of the prompt. Putting this into practice, one could prompt “a black labrador puppy” instead of “a labrador”, which generated the following images:

Moreover, the use of conditions and constraints can be an effective prompt engineering technique. This involves the use of locations, emotions, actions, orientations, or spatial relations that condition or constrain the main subject of the prompt. For example, instead of “a black labrador puppy”, one could describe “a black labrador puppy playing poker with other animals while wearing sunglasses”.

Lastly, one can provide style preferences to ensure that the images fit perfectly into the context in which they are being used. For example, if one wanted to use the images for a playful cartoon, one could use the prompt: “a black labrador puppy playing poker with other animals while wearing sunglasses, cartoon”. That input yielded these images:

These examples show that, by simply improving the input given to an AI model, one can significantly increase the subjective usefulness and thus value of the AI model. Although it requires some time and experimentation to become proficient in prompt engineering, I believe everyone who uses any form of AI can gain massive utility from practicing this skill.

References:

Lawton, G. (2023). What is Prompt Engineering? – TechTarget Definition. Enterprise AI; TechTarget. https://www.techtarget.com/searchEnterpriseAI/definition/prompt-engineering

Jones, E. (2022, October 15). AI Image Generation: The End Of Creative Jobs? Geek Culture. https://medium.com/geekculture/ai-image-generation-the-end-of-creative-jobs-ca120ab2f64a

Please rate this

Beyond the Hype: AI’s Creative Boundaries Explored

10

October

2023

No ratings yet.

OpenAI’s ChatGPT has undoubtedly garnered massive media attention through its intuitive user experience and wide application potential in various business and personal contexts. When first testing the technology, users are often impressed by its ability to swiftly and accurately respond to a wide array of prompts. Due to the comparably lower level of restrictions on responses, the generative AI model is able to generate responses that go far beyond what is verbatim stated on websites (Roose, 2023). However, after many months of personally utilizing the AI, I cannot seem to question whether it is capable of expressing creativity in its purest sense. 

So I ran an informal test to examine how well the algorithm can brainstorm new business names for a hypothetical digital marketing agency. I observed that ChatGPT often produced rather generic, repetitive and occasionally nonsensical suggestions. When given the following prompt; “Be super creative and think of 10 name suggestions for a digital marketing agency”, ChatGPT gave the following suggestions:

From the responses, it is clear the algorithm interprets rather simple literary techniques such as alliteration (repetition of the first consonant letter/sound (Merriam-webster, 2018)) as “creative”. When asked to be more innovative, the algorithm simply produced alterations of the previously suggested names: 

Regardless of the subjective appeal of these names, the suggestions from the subsequent response are highly similar to those given initially. Moreover, the revised response is not completely free of alliteration.

It could be argued that this is because creativity, in its purest sense, transcends the capability of today’s AI models. True creativity can be defined as going beyond what we know and generating new ideas (Kupers et al., 2018). Unlike current machine learning models, humans have the capacity to draw inspiration from numerous sources and make unique subjective interpretations, which aids in generating new connections that have not been made before (Kupers et al., 2018) . By the same definition, usefulness and appropriateness are also imperative facets of creativity (Kupers et al., 2018). In my opinion, ChatGPT also did not live up to its creative reputation in this capacity, as I found the names to be rather generic and obvious.

Of course, this blog post only features a singular isolated example, using two prompts, in the context of business name generation. Therefore, no generalizations can be made. However, it does become clearer that ChatGPT is not, in any way, a silver bullet for all use-cases. Indeed, the algorithm can prove highly useful in specific contexts, however, true creativity is still the missing X-factor. Although ChatGPT can appear to mimic this ability, I would argue that it does not, and possibly never will, embody the creative ability that humans possess.

References:

Kupers, E., Van Dijk, M., & Lehmann-Wermser, A. (2018). Creativity in the Here and Now: A Generic, Micro-Developmental Measure of Creativity. Frontiers in Psychology9. https://doi.org/10.3389/fpsyg.2018.02095

Merriam-webster. (2018). Definition of ALLITERATION. Merriam-Webster.com. https://www.merriam-webster.com/dictionary/alliteration

Roose, K. (2023, February 3). How ChatGPT Kicked Off an A.I. Arms Race. The New York Times. https://www.nytimes.com/2023/02/03/technology/chatgpt-openai-artificial-intelligence.html

Please rate this