Qwen-Image-Layered: transparency and layer aware open diffusion model
Posted by dvrp 1 day ago
Comments
Comment by dvrp 1 day ago
This is the first model by a main AI research lab (the people behind Qwen Image, which is basically the SOTA open image diffusion model) with those capabilities afaik.
The difference in timing for this submission (16 hours ago) is because that's when the research/academic paper got released—as opposed to the inference code and model weights, which just got released 5 hours ago.
---
Technically there's another difference, but this mostly matters for people who are interested in AI research or AI training. From their abstract: “[we introduce] a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer.” which seems to imply that you can adapt a current (but different) image model to understand layers as well, as well as a pipeline to obtain the data from Photoshop .PSD files.
Comment by joshstrange 10 hours ago
I’ve often thought “I wish I could describe what I want in Pixelmator and have it create a whole document with multiple layers that I can go back in and tweak as needed”.
Comment by Bombthecat 3 hours ago
I think the future is something like: start draft. Turn draft into image with AI refine the boring layers. Edit the important layer.
Comment by Alifatisk 13 hours ago
Comment by dvrp 1 day ago
- Paper page: https://huggingface.co/papers/2512.15603
- Model page: https://huggingface.co/Qwen/Qwen-Image-Layered
- Quantized model page: https://huggingface.co/QuantStack/Qwen-Image-Layered-GGUF
- Blog URL: https://qwenlm.github.io/blog/qwen-image-layered/ (404 at the time of writing this comment, but it'll probably release soon)
- GitHub page: https://github.com/QwenLM/Qwen-Image-Layered
Comment by smusamashah 18 hours ago
Comment by SV_BubbleTime 1 day ago
If you set a variable layers of 5 for example will it determine what is on each layer, or do I need to prompt that?
And I assume you need enough VRAM because each layer will be effectively a whole image in pixel or latent space… so if I have a 1MP image, and 5 layers I would likely need to be able to fit a 5MP image in VRAM?
Or if this can be multiple steps, where I wouldn’t need all 5 layers in active VRAM, that the assembly is another step at the end after generating on one layer?
Comment by jamilton 1 day ago
Comment by Llamamoe 1 day ago
Comment by dragonwriter 21 hours ago
Comment by djfobbz 1 day ago
Comment by CamperBob2 1 day ago
It'll drop a 600W RTX 6000 to its knees for about a minute, but it does work.
Comment by dvrp 23 hours ago
Comment by oefrha 23 hours ago
with torch.inference_mode():
output = pipeline(**inputs)
output_image = output.images[0]
for i, image in enumerate(output_image):
image.save(f"{i}.png")
Unless it's a joke that went over my head or you're talking about some other GitHub readme (there's only one GitHub link in TFA), posting an outright lie like this is not cool.Comment by dragonwriter 21 hours ago
The word "powerpoint" is not there, however this text is:
“The following scripts will start a Gradio-based web interface where you can decompose an image and export the layers into a pptx file, where you can edit and move these layers flexibly.”
Comment by oefrha 20 hours ago
Comment by ThrowawayTestr 21 hours ago
Comment by firenode 18 hours ago
Comment by ThrowawayTestr 15 hours ago
Comment by firenode 18 hours ago