Stable Diffusion

4899 readers

1 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Stable Diffusion Art (See its sidebar for more GenAI Art comms)
!aihorde@lemmy.dbzer0.com

Other communities

founded 2 years ago

MODERATORS

db0@lemmy.dbzer0.com

Even_Adder@lemmy.dbzer0.com

Training-free Regional Prompting for Diffusion Transformers (imgur.com)

submitted 8 months ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink hide all child comments

Abstract

Diffusion models have demonstrated excellent capabilities in text-to-image generation. Their semantic understanding (i.e., prompt following) ability has also been greatly improved with large language models (e.g., T5, Llama). However, existing models cannot perfectly handle long and complex text prompts, especially when the text prompts contain various objects with numerous attributes and interrelated spatial relationships. While many regional prompting methods have been proposed for UNet-based models (SD1.5, SDXL), but there are still no implementations based on the recent Diffusion Transformer (DiT) architecture, such as SD3 and this http URL this report, we propose and implement regional prompting for FLUX.1 based on attention manipulation, which enables DiT with fined-grained compositional text-to-image generation capability in a training-free manner. Code is available at this https URL.

Paper: https://arxiv.org/abs/2411.02395

Code: https://github.com/instantX-research/Regional-Prompting-FLUX

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here