This might be a bit of a stretch, but as a possible means of helping the image generator learn what to do, or what we’re aiming to make, perhaps a base image can be set or be an option to include to help give the generator an idea on what we’re looking for on some of the prompts?
I agree! An “image input” option would really be a plus for this generator!
Human Pose Estimation is already an existing technology, check the web for HPE.
Also, check this awesome website: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one Upload the image you want the AI to describe, then in the promp area ask “describe in detail the posture, composition and perspective”. It will give you the promp as the result which is somewhat helpful with the current model.