Skip to content

Support InstantStyle #7668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 22, 2024
Merged

Support InstantStyle #7668

merged 14 commits into from
Apr 22, 2024

Conversation

JY-Joy
Copy link
Contributor

@JY-Joy JY-Joy commented Apr 14, 2024

What does this PR do?

This PR is a follow-up from #7586 with modification as suggested by this comment.

Can now control the scales to IP-Adapter per-transformer block, set scale to 0 means skip the block. example usage:

  1. To use the original IP-Adapter, simply set the scale to a float.
scale_config=1.0
pipeline.activate_ip_adapter(scale_config)
  1. To use the style block (up_blocks.0.attentions.1).
scale_config = {
            "up": {
                "block_0": [0.0, 1.0, 0.0]
            },
        }
pipeline.activate_ip_adapter(scale_config)
  1. To use style+layout blocks (up_blocks.0.attentions.1 and down_blocks.2.attentions.1).
scale_config = {
            "down": {
                "block_2": [0.0, 1.0]
            },
            "up": {
                "block_0": [0.0, 1.0, 0.0]
            },
        }
pipeline.activate_ip_adapter(scale_config)

@haofanwang
Copy link
Contributor

@yiyixuxu @asomoza Could you review this new PR? @DannHuang will follow it up.

@yiyixuxu
Copy link
Collaborator

thanks for your PR!
@asomoza can you give this a first review and test it out?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@asomoza
Copy link
Member

asomoza commented Apr 15, 2024

I’m currently testing it and comparing the results. In the meantime, I’m curious as to why you decided to create a new function instead of modifying set_ip_adapter_scale. Do you anticipate a use case where we’ll need to use both functions simultaneously?

Copy link
Member

@asomoza asomoza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it and it works, thank you for you work. I left a couple of questions.

I think pretty much the use case for this is to be able to use one image for style and maybe another one for the composition, right now that's not possible though.

IMO we should enable that in this PR.

syle composition result expected
20240401142319 20240415152914 20240415165501 20240415172455

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 16, 2024

I’m currently testing it and comparing the results. In the meantime, I’m curious as to why you decided to create a new function instead of modifying set_ip_adapter_scale. Do you anticipate a use case where we’ll need to use both functions simultaneously?

I think you are right, the two functions essentially do the same thing and we shall merge them. Sorry for confusing.

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 17, 2024

Hi all, this PR is updated. Specifically:

  1. activate_ip_adapter() is now merged to set_ip_adapter_scale() and is fully compatible to the original usage. Now IP-Adapters can be controlled by a float or a list of float (which is the original usage), and also by a scale_config or a list of scale_configs. For example:
# To use style and layout from 2 reference images
scale_configs = [
            {
                "down": {
                    "block_2": [0.0, 1.0]
                }
            },
            {
                "up": {
                    "block_0": [0.0, 1.0, 0.0]
                }
            }
        ]
pipeline.set_ip_adapter_scale(scale_configs)
  1. _maybe_expand_lora_scales() now takes an additional default_scale arg which has a default value 1.0. I believe the behavior of existing code is unchanged and the ip_adapter_utils.py is removed.

I've tested it and it works, thank you for you work. I left a couple of questions.

I think pretty much the use case for this is to be able to use one image for style and maybe another one for the composition, right now that's not possible though.

In case of multiple reference images, I believe we can achieve this by load the same IP-Adapter for multiple times, and set the correspond IP-Adapter to style-mode/layout-mode. The following code works for me:

...
pipe.load_ip_adapter(ip_adapter_path, subfolder="sdxl_models", weight_name=["ip-adapter_sdxl.bin", "ip-adapter_sdxl.bin"], image_encoder_path=image_encoder_path)
...
pipe.set_ip_adapter_scale(scale_configs)
images = pipe(
    prompt="a llama, masterpiece, best quality, high quality",
    ip_adapter_image=[style_img, composition_img],
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    scale=1.0,
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images

Please let me know if there is any other issue, thanks all :)

@JY-Joy JY-Joy requested a review from UmerHA April 17, 2024 11:56
Copy link
Member

@asomoza asomoza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, just one comment and the rest looks good to me. The expected result also works. You can mark as resolved my comments from before.

@yiyixuxu this PR is ready for your final review

@asomoza
Copy link
Member

asomoza commented Apr 17, 2024

For this:

In case of multiple reference images, I believe we can achieve this by load the same IP-Adapter for multiple times, and set the correspond IP-Adapter to style-mode/layout-mode.

This time I really don't see a solution to prevent loading the same weights multiple times without changing how IP Adapters are loaded in the pipelines.

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 18, 2024

All the mentioned issues is solved. Please let me known if there is any other.

This time I really don't see a solution to prevent loading the same weights multiple times without changing how IP Adapters are loaded in the pipelines.

IMO we want to load two IP Adapters, one for style control and the other for layout. They happened to have same pre-trained weight in this specific case but they do not need to be the same actually. We believe this implementation have a minimum modification to the original IP-Adapter pipeline, but we have no idea if it is the best solution to @asomoza's case.

@haofanwang
Copy link
Contributor

@yiyixuxu Should be ready to merge.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great! thanks!

@yiyixuxu yiyixuxu requested review from sayakpaul and removed request for UmerHA April 18, 2024 08:37
@yiyixuxu
Copy link
Collaborator

cc @sayakpaul can you give a final review?

also, it seems like quality changed files that are not supposed to be changed - I've seen this issue in multiple PRs now, what's going on?

@sayakpaul
Copy link
Member

also, it seems like quality changed files that are not supposed to be changed - I've seen this issue in multiple PRs now, what's going on?

Here's my hypothesis.

#7314 made changes to the dependencies included in "quality". Refer to the setup.py:

extras["quality"] = deps_list("urllib3", "isort", "ruff", "hf-doc-builder")

So with this dependency included, whenever you run make style it will re-format the docstring and documentation pages if needed.

Contributors may not have updated the quality dependencies before running make quality and make style which is why there are likely changes in unexpected files. LMK if this is unclear.

@sayakpaul
Copy link
Member

Yup, can confirm my hypothesis with c45b1c7. See the number of file changes dropped to 4 from 17.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Could we have some documentation with test code on how to use this feature as well?

When testing, prefer non-human objects.

@fabiorigano would be great to have your reviews on this too :-)

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 19, 2024

Yup, can confirm my hypothesis with c45b1c7. See the number of file changes dropped to 4 from 17.

Thanks for your solution! This really confused us as the we passed the quality check at local version. We will test with the latest dependencies and solve the quality check ASAP.

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 19, 2024

Looking great! Could we have some documentation with test code on how to use this feature as well?

When testing, prefer non-human objects.

@fabiorigano would be great to have your reviews on this too :-)

Absolutely, will update it soon.

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 19, 2024

Most of the issues are resolved except those related to the use case of default_scale, I believe the current version is as expected. @sayakpaul Can I mark those comments as resolved? BTW, should I add documentation and tests to docs/source/en/using-diffusers/ip_adapter.md and tests/pipelines/ip_adapters?

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 20, 2024

Hi all, I've just pushed some updates to this PR. Specifically:

  1. multi masked IP inputs is now supported. For the case @yiyixuxu provides:
    multi_masks_org_out
    with InstantStyle that injecting ip_female_style and ip_male_style to only style layers:
    multi_mask
  2. now can set different scales for masked IP in set_ip_adapter_scale. For example, to set scales for 3 masked IP in up.blocks.0 layer, use this scale config:
scale_0 = { "up": { "block_0": [[0.75, 0.75, 0.3]]}}

this will set scale [0.75, 0.75, 0.3] for the corresponding 3 masked IP to both 3 transformer blocks in up.blocks.0. My solution here require a length-1 list over list of scales to avoid the ambiguity against specifying different scales for the transformer blocks. For details please refer to https://github.com/DannHuang/diffusers/blob/6fc9a3af2df947d99e80148cc7c40d4abb0ac86d/src/diffusers/loaders/unet_loader_utils.py#L143-L145
3) a test case from @yiyixuxu is included.
4) conflicts resolved.

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 20, 2024

it seems like I still have some problems with my quality dependencies, can u help me with it again @sayakpaul, really thanks a lot!

@fabiorigano
Copy link
Contributor

it seems like I still have some problems with my quality dependencies, can u help me with it again @sayakpaul, really thanks a lot!

you have to run make quality how to open a PR

@haofanwang
Copy link
Contributor

haofanwang commented Apr 20, 2024

@fabiorigano @sayakpaul Formatted.

@yiyixuxu yiyixuxu merged commit 21c747f into huggingface:main Apr 22, 2024
@yiyixuxu
Copy link
Collaborator

very nice work! thank you all!!

@wodsoe
Copy link

wodsoe commented Apr 24, 2024

0

run the example:
attention_processor.py", line 1157, in call
hidden_states.shape if encoder_hidden_states is None else encoder_hidden_states.shape
AttributeError: 'tuple' object has no attribute 'shape'

@DannHuang

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 24, 2024

run the example: attention_processor.py", line 1157, in call hidden_states.shape if encoder_hidden_states is None else encoder_hidden_states.shape AttributeError: 'tuple' object has no attribute 'shape' @DannHuang

Hi @wodsoe,
It seems like there is a mistake in the example, really sorry for confusing.
The following code should work:

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")

Please let me know if there is any further issue. Thanks a lot!

@emberMd
Copy link

emberMd commented Apr 25, 2024

I am not able to get the InstantStyle implementation working with IP-Adapter. It is not directly related to @wodsoe , but it comes from attention_processors.py too.

Im using the exact example from the Diffusers IP-Adapter documentation.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForImage2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")

generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
    prompt="a cat, masterpiece, best quality, high quality",
    image=style_image,
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images[0]
image

Using AutoPipelineForImage2Image i got an error because the ip_adapter_image parameter is not defined in the code.

ValueError: <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'> has the config param `encoder_hid_dim_type` set to 'ip_image_proj' which requires the keyword argument `image_embeds` to be passed in  `added_conditions`

Then, if I use AutoPipelineForText2Image giving 'style_image' as the 'ip_adapter_image' i get the next error. The same happens when with AutoPipelineForImage2Image using 'style_image' for both 'image' and 'ip_adapter_image'.

File ~/miniconda3/envs/if/lib/python3.10/site-packages/diffusers/models/attention_processor.py:2417, in IPAdapterAttnProcessor2_0.__call__(self, attn, hidden_states, encoder_hidden_states, attention_mask, temb, scale, ip_adapter_masks)
   2413         mask_downsample = mask_downsample.to(dtype=query.dtype, device=query.device)
   2415         current_ip_hidden_states = current_ip_hidden_states * mask_downsample
-> 2417     hidden_states = hidden_states + scale * current_ip_hidden_states
   2419 # linear proj
   2420 hidden_states = attn.to_out[0](hidden_states)

TypeError: unsupported operand type(s) for *: 'dict' and 'Tensor'

Maybe Im doing something wrong. IP-Adapter works fine when 'scale' is a float number for the traditional implementation, but it doesn't seem to work for this specific case. Thanks and sorry for the long reply :)

@sayakpaul
Copy link
Member

I can reproduce this problem. The code snippet is the same as what's provided in https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter#style--layout-control.

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 25, 2024

Hi @emberMd, Thanks for trying InstantStyle! It's a little bit weird because replacing AutoPipelineForImage2Image with AutoPipelineForText2Image and passing style_image as ip_adapter_image to the pipe line works fine with me. Here is the complete code for your reference:

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
    prompt="a cat, masterpiece, best quality, high quality",
    ip_adapter_image=style_image,
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images[0]
image

For the second error in your reply, I guess it's due to the version problem as the dictionary type scale was set to each attn_processor. Could you please try clone the latest diffusers repo and directly install from source?

@emberMd
Copy link

emberMd commented Apr 25, 2024

Hi @DannHuang! thanks for the fast reply.

I guess it's weird, the code you passed still doesn't work for me. I imagined that it could be due to a version problem in Diffusers, but since I am already on 0.27.2 I wanted to confirm first that I was not doing anything strange in my code.

I will install from the repo and see if that solves it. Thanks so much for the help!

@sayakpaul
Copy link
Member

image=style_image should be ip_adapter_image=style_image

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 25, 2024

@sayakpaul my bad, thanks a lot! @emberMd the code snippet is updated.

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(
    prompt="a cat, masterpiece, best quality, high quality",
    ip_adapter_image=style_image,
    negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    guidance_scale=5,
    num_inference_steps=30,
    generator=generator,
).images[0]
image

@emberMd
Copy link

emberMd commented Apr 25, 2024

Thanks @sayakpaul @DannHuang! Really appreciate it

I was aware of the 'image=style_image' error, but i was using DIffusers 0.27.2, now with 0.28.0.dev0 it works fine. My bad

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 25, 2024

@emberMd Good to know the errors were solved. Hope you can enjoy InstantStyle :)

@kadirnar
Copy link
Contributor

@DannHuang , @sayakpaul
I get an error when I add xformers and flash-attention.

AttributeError: 'tuple' object has no attribute 'shape'

@yiyixuxu
Copy link
Collaborator

@DannHuang can we update the doc with the correct example?

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 26, 2024

@yiyixuxu Yeah sure, we will update the doc soon.

@JY-Joy
Copy link
Contributor Author

JY-Joy commented Apr 28, 2024

@yiyixuxu Hi, the doc is updated in #7806. Please have a check!

@yiyixuxu
Copy link
Collaborator

@DannHuang
merged! thank you:)

sayakpaul added a commit that referenced this pull request Dec 23, 2024
* enable control ip-adapter per-transformer block on-the-fly

---------

Co-authored-by: sayakpaul <[email protected]>
Co-authored-by: ResearcherXman <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.