Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models. (arXiv:2211.12503v1 [cs.CL])

Natural language often contains ambiguities that can lead to
misinterpretation and miscommunication. While humans can handle ambiguities
effectively by asking clarifying questions and/or relying on contextual cues
and common-sense knowledge, resolving ambiguities can be notoriously hard for
machines. In this work, we study ambiguities that arise in text-to-image
generative models. We curate a benchmark dataset covering different types of
ambiguities that occur in these systems. We then propose a framework to
mitigate ambiguities in the prompts given to the systems by soliciting
clarifications from the user. Through automatic and human evaluations, we show
the effectiveness of our framework in generating more faithful images aligned
with human intention in the presence of ambiguities.



Related post