-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Support InstantStyle #7668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support InstantStyle #7668
Conversation
thanks for your PR! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
I’m currently testing it and comparing the results. In the meantime, I’m curious as to why you decided to create a new function instead of modifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested it and it works, thank you for you work. I left a couple of questions.
I think pretty much the use case for this is to be able to use one image for style and maybe another one for the composition, right now that's not possible though.
IMO we should enable that in this PR.
syle | composition | result | expected |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
I think you are right, the two functions essentially do the same thing and we shall merge them. Sorry for confusing. |
Hi all, this PR is updated. Specifically:
# To use style and layout from 2 reference images
scale_configs = [
{
"down": {
"block_2": [0.0, 1.0]
}
},
{
"up": {
"block_0": [0.0, 1.0, 0.0]
}
}
]
pipeline.set_ip_adapter_scale(scale_configs)
In case of multiple reference images, I believe we can achieve this by load the same IP-Adapter for multiple times, and set the correspond IP-Adapter to style-mode/layout-mode. The following code works for me: ...
pipe.load_ip_adapter(ip_adapter_path, subfolder="sdxl_models", weight_name=["ip-adapter_sdxl.bin", "ip-adapter_sdxl.bin"], image_encoder_path=image_encoder_path)
...
pipe.set_ip_adapter_scale(scale_configs)
images = pipe(
prompt="a llama, masterpiece, best quality, high quality",
ip_adapter_image=[style_img, composition_img],
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
scale=1.0,
guidance_scale=5,
num_inference_steps=30,
generator=generator,
).images Please let me know if there is any other issue, thanks all :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, just one comment and the rest looks good to me. The expected result also works. You can mark as resolved my comments from before.
@yiyixuxu this PR is ready for your final review
For this:
This time I really don't see a solution to prevent loading the same weights multiple times without changing how IP Adapters are loaded in the pipelines. |
All the mentioned issues is solved. Please let me known if there is any other.
IMO we want to load two IP Adapters, one for style control and the other for layout. They happened to have same pre-trained weight in this specific case but they do not need to be the same actually. We believe this implementation have a minimum modification to the original IP-Adapter pipeline, but we have no idea if it is the best solution to @asomoza's case. |
@yiyixuxu Should be ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking great! thanks!
cc @sayakpaul can you give a final review? also, it seems like quality changed files that are not supposed to be changed - I've seen this issue in multiple PRs now, what's going on? |
Here's my hypothesis. #7314 made changes to the dependencies included in "quality". Refer to the setup.py: Line 209 in b5c8b55
So with this dependency included, whenever you run Contributors may not have updated the quality dependencies before running |
Yup, can confirm my hypothesis with c45b1c7. See the number of file changes dropped to 4 from 17. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Could we have some documentation with test code on how to use this feature as well?
When testing, prefer non-human objects.
@fabiorigano would be great to have your reviews on this too :-)
Thanks for your solution! This really confused us as the we passed the quality check at local version. We will test with the latest dependencies and solve the quality check ASAP. |
Absolutely, will update it soon. |
Most of the issues are resolved except those related to the use case of |
Hi all, I've just pushed some updates to this PR. Specifically:
scale_0 = { "up": { "block_0": [[0.75, 0.75, 0.3]]}} this will set scale [0.75, 0.75, 0.3] for the corresponding 3 masked IP to both 3 transformer blocks in |
it seems like I still have some problems with my quality dependencies, can u help me with it again @sayakpaul, really thanks a lot! |
you have to run |
@fabiorigano @sayakpaul Formatted. |
very nice work! thank you all!! |
Hi @wodsoe, from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") Please let me know if there is any further issue. Thanks a lot! |
I am not able to get the InstantStyle implementation working with IP-Adapter. It is not directly related to @wodsoe , but it comes from attention_processors.py too. Im using the exact example from the Diffusers IP-Adapter documentation. import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch
pipeline = AutoPipelineForImage2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
scale = {
"down": {"block_2": [0.0, 1.0]},
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)
style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
prompt="a cat, masterpiece, best quality, high quality",
image=style_image,
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
guidance_scale=5,
num_inference_steps=30,
generator=generator,
).images[0]
image Using AutoPipelineForImage2Image i got an error because the ip_adapter_image parameter is not defined in the code. ValueError: <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'> has the config param `encoder_hid_dim_type` set to 'ip_image_proj' which requires the keyword argument `image_embeds` to be passed in `added_conditions` Then, if I use AutoPipelineForText2Image giving 'style_image' as the 'ip_adapter_image' i get the next error. The same happens when with AutoPipelineForImage2Image using 'style_image' for both 'image' and 'ip_adapter_image'.
Maybe Im doing something wrong. IP-Adapter works fine when 'scale' is a float number for the traditional implementation, but it doesn't seem to work for this specific case. Thanks and sorry for the long reply :) |
I can reproduce this problem. The code snippet is the same as what's provided in https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter#style--layout-control. |
Hi @emberMd, Thanks for trying InstantStyle! It's a little bit weird because replacing from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
scale = {
"down": {"block_2": [0.0, 1.0]},
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)
style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
prompt="a cat, masterpiece, best quality, high quality",
ip_adapter_image=style_image,
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
guidance_scale=5,
num_inference_steps=30,
generator=generator,
).images[0]
image For the second error in your reply, I guess it's due to the version problem as the dictionary type |
Hi @DannHuang! thanks for the fast reply. I guess it's weird, the code you passed still doesn't work for me. I imagined that it could be due to a version problem in Diffusers, but since I am already on 0.27.2 I wanted to confirm first that I was not doing anything strange in my code. I will install from the repo and see if that solves it. Thanks so much for the help! |
|
@sayakpaul my bad, thanks a lot! @emberMd the code snippet is updated.
|
Thanks @sayakpaul @DannHuang! Really appreciate it I was aware of the 'image=style_image' error, but i was using DIffusers 0.27.2, now with 0.28.0.dev0 it works fine. My bad |
@emberMd Good to know the errors were solved. Hope you can enjoy InstantStyle :) |
@DannHuang , @sayakpaul AttributeError: 'tuple' object has no attribute 'shape' |
@DannHuang can we update the doc with the correct example? |
@yiyixuxu Yeah sure, we will update the doc soon. |
@DannHuang |
* enable control ip-adapter per-transformer block on-the-fly --------- Co-authored-by: sayakpaul <[email protected]> Co-authored-by: ResearcherXman <[email protected]> Co-authored-by: YiYi Xu <[email protected]>
What does this PR do?
This PR is a follow-up from #7586 with modification as suggested by this comment.
Can now control the scales to IP-Adapter per-transformer block, set scale to 0 means skip the block. example usage: