“Generate an image of a man holding a red helium balloon in his left hand”
This was the prompt I asked OpenAI’s Image Generator DALL-E 2, 2 years ago. The result? A perfect shiny red helium balloon on a string with a man in a blue shirt holding it with two… things.
On a closer inspection I could make up that it, in fact, tried to make two hands, but it had failed miserably. The fingers were merging into eachother and his hands were also connected while holding the string of the balloon. Poor guy.
When I tried this again, the same thing happened. A beautiful image of a man holding a red shiny balloon but it again had failed at generating a realistic image of the hands. So why is it that these AI-Image Generators oftentimes struggle so much with generating hands and fingers?
Back in 2022 when these models were released, people were amazed by the images that these generators were spitting out. The images were so advanced that in August of 2022, the Image Generator Model from the company Midjourney won an art contest at a state fair with one of its images.
But users quickly started to notice a recurring bug. Everytime when a prompt included people, the AI tools couldn’t draw hands. Hands with 7 fingers, hands that appeared to be floating, unattached to the human body or, in my case, hands that were fused together at the wrists. But why?
The simple answer? Its hard to draw hands! Just like beginning (human) artists, AI struggles with hands because a hand is a very complex part of our body. It has multiple elements of varying shapes and sizes. In addition to that, its structure is incredibly intricate. Hands are built from parts that fit together with perfect precision. Fingers, palms, joints, and tendons all connect in fixed patterns. To draw them well, you must study how these parts move and align.
The case with AI models is that they learn by finding patterns in data. They do not actually understand structure as humans do. A human artists learns through observation and reasoning, the models learn through repetition in data. In addition to that, to reallisticly capture how the hand deforms during various hand movements, algorithms need to understand how our joints function and the range of motion each of them capture. Which is increadibly hard and tedious to train.
However, as time passes there are definitly improvements being made. Images with 6 fingers or hands fused into each other are less common now. That is because new models use larger datasets with millions of hand examples. They learn cleaner shapes, smoother joints, and more natural poses. The improvement isn’t magic. It comes from better training methods and higher quality references. Still, the model doesn’t understand anatomy. It predicts what looks right, not what is right. You can see progress, but the small mistakes remind you that pattern recognition is not comprehension.
And about that image I mentioned earlier with the prompt of a man holding a red balloon, here it is:

Sources:
Avenga. (2025, 18 juni). Why Generative AI Models Fail at Creating Human Hands – Avenga. https://www.avenga.com/magazine/generative-ai-models-fail-at-creating-human-hands/#:~:text=So%2C%20Why%20Do%20Image%20Generators,may%20take%20it%20for%20granted.
Matthias, & Meg. (2023, 25 augustus). Why does AI art screw up hands and fingers? | Explanation, Tools, & Facts. Encyclopedia Britannica. https://www.britannica.com/topic/Why-does-AI-art-screw-up-hands-and-fingers-2230501