Skip to content

Pass quantization_kwargs to CLIP builders #1994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 13, 2024

Conversation

joecummings
Copy link
Contributor

@joecummings joecummings commented Nov 12, 2024

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?
*

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)
(joe-torchtune) [[email protected] ~/projects/joe-torchtune (update-clip-with-quant-nums)]$ tune run --nproc-per-node 8 lora_finetune_distributed --config llama3_2_vision/90B_qlora max_steps_per_epoch=5 lr_scheduler.num_warmup_steps=0
Running with torchrun...
W1112 12:28:55.300000 2440397 site-packages/torch/distributed/run.py:793]
W1112 12:28:55.300000 2440397 site-packages/torch/distributed/run.py:793] *****************************************
W1112 12:28:55.300000 2440397 site-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1112 12:28:55.300000 2440397 site-packages/torch/distributed/run.py:793] *****************************************

  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
lr_scheduler:
  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
  num_warmup_steps: 0
max_steps_per_epoch: 5
metric_logger:
  _component_: torchtune.training.metric_logging.DiskLogger
  log_dir: /tmp/Llama-3.2-90B-Vision-Instruct/logs
model:
  _component_: torchtune.models.llama3_2_vision.qlora_llama3_2_vision_90b
  apply_lora_to_mlp: true
  apply_lora_to_output: false
  decoder_trainable: frozen
  encoder_trainable: lora
  fusion_trainable: lora
  image_size: 560
  lora_alpha: 16
  lora_attn_modules:
  - q_proj
  - v_proj
  - output_proj
  lora_dropout: 0.0
  lora_rank: 8
optimizer:
  _component_: torch.optim.AdamW
  fused: true
  lr: 0.0001
  weight_decay: 0.01
output_dir: /tmp/qlora-llama3.2-vision-finetune
profiler:
  _component_: torchtune.training.setup_torch_profiler
  active_steps: 2
  cpu: true
  cuda: true
  enabled: false
  num_cycles: 1
  output_dir: /tmp/qlora-llama3.2-vision-finetune/profiling_outputs
  profile_memory: false
  record_shapes: true
  wait_steps: 5
  warmup_steps: 3
  with_flops: false
  with_stack: false
resume_from_checkpoint: false
save_adapter_weights_only: false
seed: null
shuffle: true
tokenizer:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_transform
  image_size: 560
  max_seq_len: 8192
  path: /tmp/Llama-3.2-90B-Vision-Instruct/original/tokenizer.model

NCCL version 2.21.5+cuda12.4
DEBUG:torchtune.utils._logging:Setting manual seed to local seed 3509117360. Local seed is seed + rank = 3509117360 + 0
Writing logs to /tmp/Llama-3.2-90B-Vision-Instruct/logs/log_1731443356.txt
INFO:torchtune.utils._logging:FSDP is enabled. Instantiating model and loading checkpoint on Rank 0 ...
INFO:torchtune.utils._logging:Instantiating model and loading checkpoint took 127.72 secs
INFO:torchtune.utils._logging:Memory stats after model init:
        GPU peak memory allocation: 7.77 GiB
        GPU peak memory reserved: 9.25 GiB
        GPU peak memory active: 7.77 GiB
INFO:torchtune.utils._logging:Optimizer is initialized.
INFO:torchtune.utils._logging:Loss is initialized.
INFO:torchtune.utils._logging:Dataset and Sampler are initialized.
INFO:torchtune.utils._logging:Learning rate scheduler is initialized.
WARNING:torchtune.utils._logging: Profiling disabled.
INFO:torchtune.utils._logging: Profiler config after instantiation: {'enabled': False}
1|5|Loss: 0.90829998254776: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [06:31<00:00, 77.67s/it]
INFO:torchtune.utils._logging:Saving checkpoint. This may take some time. Retrieving full model state dict...
INFO:torchtune.utils._logging:Getting full model state dict took 185.25 secs
INFO:torchtune.utils._logging:Model checkpoint of size 4.60 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0001_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0002_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0003_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0004_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0005_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0006_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0007_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0008_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0009_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0010_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0011_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0012_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0013_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0014_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0015_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0016_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0017_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0018_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0019_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0020_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0021_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0022_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0023_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0024_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0025_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0026_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0027_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0028_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0029_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0030_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0031_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0032_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0033_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0034_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0035_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0036_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.88 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0037_0.pt
INFO:torchtune.utils._logging:Adapter checkpoint of size 0.26 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/adapter_0.pt
WARNING:torchtune.utils._logging:Saving Llama3.2 Vision adapter weights to PEFT format is not supported, saving to torchtune format instead
WARNING:torchtune.utils._logging:PEFT integration for Llama3.2 Vision is not supported, skipping adapter config save
INFO:torchtune.utils._logging:Saving final epoch checkpoint.
INFO:torchtune.utils._logging:The full model checkpoint, including all weights and configurations, has been saved successfully.You can now use this checkpoint for further training or inference.
INFO:torchtune.utils._logging:Saving checkpoint took 450.86 secs
1|5|Loss: 0.90829998254776: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

  • I did not change any public API
  • I have added an example to docs or docstrings

Copy link

pytorch-bot bot commented Nov 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1994

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3cdfd92 with merge base 4df97ad (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 12, 2024
@@ -193,6 +193,9 @@ def lora_llama3_2_vision_11b(
lora_dropout=lora_dropout,
use_dora=use_dora,
quantize_base=quantize_base,
# Update scaler block size to ensure that weights can be quantized evenly across 1, 2, 4, 6, 8 GPUs.
# This is dependent on ``clip_embed_dim`` so if that is updated, this variable should be as well
scaler_block_size=200 if quantize_base else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So no negative perf impact to using this value on < 8 GPUs?

Copy link
Contributor Author

@joecummings joecummings Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With diff scaler block size
(joe-torchtune) [[email protected] ~/projects/joe-torchtune (update-clip-with-quant-nums)]$ tune run --nproc-per-node 4 lora_finetune_distributed --config llama3_2_vision/90B_qlora max_steps_per_epoch=5 lr_scheduler.num_warmup_steps=0
Running with torchrun...
W1112 18:35:18.602000 4067602 site-packages/torch/distributed/run.py:793]
W1112 18:35:18.602000 4067602 site-packages/torch/distributed/run.py:793] *****************************************
W1112 18:35:18.602000 4067602 site-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1112 18:35:18.602000 4067602 site-packages/torch/distributed/run.py:793] *****************************************

  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
lr_scheduler:
  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
  num_warmup_steps: 0
max_steps_per_epoch: 5
metric_logger:
  _component_: torchtune.training.metric_logging.DiskLogger
  log_dir: /tmp/Llama-3.2-90B-Vision-Instruct/logs
model:
  _component_: torchtune.models.llama3_2_vision.qlora_llama3_2_vision_90b
  apply_lora_to_mlp: true
  apply_lora_to_output: false
  decoder_trainable: frozen
  encoder_trainable: lora
  fusion_trainable: lora
  image_size: 560
  lora_alpha: 16
  lora_attn_modules:
  - q_proj
  - v_proj
  - output_proj
  lora_dropout: 0.0
  lora_rank: 8
optimizer:
  _component_: torch.optim.AdamW
  fused: true
  lr: 0.0001
  weight_decay: 0.01
output_dir: /tmp/qlora-llama3.2-vision-finetune
profiler:
  _component_: torchtune.training.setup_torch_profiler
  active_steps: 2
  cpu: true
  cuda: true
  enabled: false
  num_cycles: 1
  output_dir: /tmp/qlora-llama3.2-vision-finetune/profiling_outputs
  profile_memory: false
  record_shapes: true
  wait_steps: 5
  warmup_steps: 3
  with_flops: false
  with_stack: false
resume_from_checkpoint: false
save_adapter_weights_only: false
seed: null
shuffle: true
tokenizer:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_transform
  image_size: 560
  max_seq_len: 8192
  path: /tmp/Llama-3.2-90B-Vision-Instruct/original/tokenizer.model

NCCL version 2.21.5+cuda12.4
DEBUG:torchtune.utils._logging:Setting manual seed to local seed 3681342200. Local seed is seed + rank = 3681342200 + 0
Writing logs to /tmp/Llama-3.2-90B-Vision-Instruct/logs/log_1731465330.txt
INFO:torchtune.utils._logging:FSDP is enabled. Instantiating model and loading checkpoint on Rank 0 ...
INFO:torchtune.utils._logging:Instantiating model and loading checkpoint took 128.29 secs
INFO:torchtune.utils._logging:Memory stats after model init:
        GPU peak memory allocation: 13.40 GiB
        GPU peak memory reserved: 14.53 GiB
        GPU peak memory active: 13.40 GiB
INFO:torchtune.utils._logging:Optimizer is initialized.
INFO:torchtune.utils._logging:Loss is initialized.
INFO:torchtune.utils._logging:Dataset and Sampler are initialized.
INFO:torchtune.utils._logging:Learning rate scheduler is initialized.
WARNING:torchtune.utils._logging: Profiling disabled.
INFO:torchtune.utils._logging: Profiler config after instantiation: {'enabled': False}
1|5|Loss: 0.8903459906578064: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [13:04<00:00, 156.15s/it]
INFO:torchtune.utils._logging:Saving checkpoint. This may take some time. Retrieving full model state dict...
INFO:torchtune.utils._logging:Getting full model state dict took 207.90 secs
INFO:torchtune.utils._logging:Model checkpoint of size 4.60 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0001_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0002_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0003_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0004_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0005_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0006_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0007_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0008_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0009_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0010_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0011_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0012_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0013_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.97 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0014_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0015_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0016_0.pt
INFO:torchtune.utils._logging:Model checkpoint of size 4.66 GB saved to /tmp/Llama-3.2-90B-Vision-Instruct/hf_model_0017_0.pt
With default scaler
(joe-torchtune) [[email protected] ~/projects/joe-torchtune (update-clip-with-quant-nums)]$ tune run --nproc-per-node 4 lora_finetune_distributed --config llama3_2_vision/90B_qlora max_steps_per_epoch=5 lr_scheduler.num_warmup_steps=0
Running with torchrun...
W1112 19:01:59.371000 237246 site-packages/torch/distributed/run.py:793]
W1112 19:01:59.371000 237246 site-packages/torch/distributed/run.py:793] *****************************************
W1112 19:01:59.371000 237246 site-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1112 19:01:59.371000 237246 site-packages/torch/distributed/run.py:793] *****************************************

  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
lr_scheduler:
  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
  num_warmup_steps: 0
max_steps_per_epoch: 5
metric_logger:
  _component_: torchtune.training.metric_logging.DiskLogger
  log_dir: /tmp/Llama-3.2-90B-Vision-Instruct/logs
model:
  _component_: torchtune.models.llama3_2_vision.qlora_llama3_2_vision_90b
  apply_lora_to_mlp: true
  apply_lora_to_output: false
  decoder_trainable: frozen
  encoder_trainable: lora
  fusion_trainable: lora
  image_size: 560
  lora_alpha: 16
  lora_attn_modules:
  - q_proj
  - v_proj
  - output_proj
  lora_dropout: 0.0
  lora_rank: 8
optimizer:
  _component_: torch.optim.AdamW
  fused: true
  lr: 0.0001
  weight_decay: 0.01
output_dir: /tmp/qlora-llama3.2-vision-finetune
profiler:
  _component_: torchtune.training.setup_torch_profiler
  active_steps: 2
  cpu: true
  cuda: true
  enabled: false
  num_cycles: 1
  output_dir: /tmp/qlora-llama3.2-vision-finetune/profiling_outputs
  profile_memory: false
  record_shapes: true
  wait_steps: 5
  warmup_steps: 3
  with_flops: false
  with_stack: false
resume_from_checkpoint: false
save_adapter_weights_only: false
seed: null
shuffle: true
tokenizer:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_transform
  image_size: 560
  max_seq_len: 8192
  path: /tmp/Llama-3.2-90B-Vision-Instruct/original/tokenizer.model

NCCL version 2.21.5+cuda12.4
DEBUG:torchtune.utils._logging:Setting manual seed to local seed 1189370465. Local seed is seed + rank = 1189370465 + 0
Writing logs to /tmp/Llama-3.2-90B-Vision-Instruct/logs/log_1731466932.txt
INFO:torchtune.utils._logging:FSDP is enabled. Instantiating model and loading checkpoint on Rank 0 ...
INFO:torchtune.utils._logging:Instantiating model and loading checkpoint took 127.39 secs
INFO:torchtune.utils._logging:Memory stats after model init:
        GPU peak memory allocation: 13.40 GiB
        GPU peak memory reserved: 14.53 GiB
        GPU peak memory active: 13.40 GiB
INFO:torchtune.utils._logging:Optimizer is initialized.
INFO:torchtune.utils._logging:Loss is initialized.
INFO:torchtune.utils._logging:Dataset and Sampler are initialized.
INFO:torchtune.utils._logging:Learning rate scheduler is initialized.
WARNING:torchtune.utils._logging: Profiling disabled.
INFO:torchtune.utils._logging: Profiler config after instantiation: {'enabled': False}
1|5|Loss: 0.8870511651039124: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [13:37<00:00, 162.95s/it]
INFO:torchtune.utils._logging:Saving checkpoint. This may take some time. Retrieving full model state dict...
INFO:torchtune.utils._logging:Getting full model state dict took 201.79 secs

Looks to be no difference in speed or memory

@joecummings joecummings merged commit 51b31c8 into pytorch:main Nov 13, 2024
17 checks passed
@joecummings joecummings deleted the update-clip-with-quant-nums branch November 13, 2024 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants