Optuna Integration for automated Hyper parameter search #315

Anindyadeep · 2023-09-02T15:43:36Z

Description

This PR targets to add a new feature of automated hyperparameter search using Optuna. Additionally, it also introduces a new spec for doing a hyperparameter search using three ways.

Hyperparameter search using Optuna
Hyperparameter search using GPT API using the TaskType and complex prompting
Hybrid search using GPT and Optuna

This PR solves issue #313

This is how the train_model() function changes from the client's side

from prompt2model.model_trainer import GenerationModelTrainer
from pathlib import Path

trainer = GenerationModelTrainer(
    pre_train_model_name,
    has_encoder=True,
    executor_batch_size=8,
    tokenizer_max_length=1024,
    sequence_max_length=1280,
    device="CPU"
)

args_output_root = Path("result/training_output")
args_output_root.mkdir(parents=True, exist_ok=True)

trained_model, trained_tokenizer = trainer.train_model(
    training_datasets=train_datasets,
    validation_datasets=val_datasets,
    hyperparameter_search_mode="optuna"
)

Some additional changes like supporting default hyperparameters as an option is also provided. However that is something needs to be discussed upon.

This commits adds new feature for adding hyperparameter search using optuna

This commit integrates the newly integrated automated hyperparam search through optuna in the prompt2model train_model function. This commit also changes some parts of the training pipeline and introduces a new argument inside train_model() function called `hyperparameter_search_mode` where user can either do not trigger hyperparm search or they can do it through optuna or gpt or a hybrid approach. This commit only targets for optuna.

neubig · 2023-09-02T16:31:41Z

Wow, thanks for the contribution @Anindyadeep!

First, a few initial comments:

You listed three different things that could be done for hyperparameter search. I would definitely suggest that we split those into three separate PRs (for ease of reviewing). So we can just review the optuna hyperparameter search in this PR.
Just to clarify, would you like us to start reviewing this now, or is it still WIP?
It seems that this is not passing formatting checks. I would suggest that you run pre-commit checks, as detailed here.

Anindyadeep · 2023-09-02T16:40:02Z

Wow, thanks for the contribution @Anindyadeep!

First, a few initial comments:

You listed three different things that could be done for hyperparameter search. I would definitely suggest that we split those into three separate PRs (for ease of reviewing). So we can just review the optuna hyperparameter search in this PR.

Just to clarify, would you like us to start reviewing this now, or is it still WIP?

It seems that this is not passing formatting checks. I would suggest that you run pre-commit checks, as detailed here.

Yes, I intend to split that into three different PRs, I just have mentioned it here, will remove that Todo eventually.
I also mentioned @viswavi (as he is been mentoring me in this) to start a quick review and would appreciate it if you also provide a high-level review of this. (However the work is still in progress, I have to test the working of the hyperparameter search since, my device does not have that much GPU)
As I have mentioned, currently I have a gap in making changes and testing those, so I just wanted to add a PR to get your views as well as run the branch inside colab. We can make this a draft PR for now. However, please do provide your thoughts on the changes I am making eventially.

neubig · 2023-09-02T16:48:07Z

Sounds great, I'll take a look when I have a chance.

viswavi · 2023-09-03T14:38:33Z

prompt2model/param_selector/generate.py

+# - Dynamic initialization of hyperparamter range from task type and complexity
+# - Using LLM to suggest hyperparameter
+
+class AutomamatedParamSelector:


Typo in name

Suggested change

class AutomamatedParamSelector:

class AutomatedParamSelector:

viswavi · 2023-09-03T14:40:06Z

prompt2model/param_selector/generate.py

@@ -0,0 +1,39 @@
+"""This module provides a dummy trainer for testing purposes."""


Please update the docstring to be more accurate. Moreover, I don't think the file name prompt2model/param_selector/generate.py is descriptive. Maybe instead call it prompt2model/param_selector/optuna_selector.py?

If you can be more specific about the algorithm used underneath Optuna (e.g. Bayesian optimization) then we can be even more specific in the ffile name.

viswavi · 2023-09-03T14:42:29Z

Hi @Anindyadeep, I made a quick pass through this and generally it looks very good. Thank you for the quick work! I've left 2 minor comments in the PR. After addressing those, can you potentially clean up the code a little bit?

from the repo root directory, run pre-commit run --all-files
and also run pytest to make sure this change has not broken any other tests.

After doing this, I will make an in-depth review of the PR.

viswavi · 2023-09-03T14:42:53Z

Also, it looks like there may be merge conflicts with neulab:main

viswavi · 2023-09-03T20:04:30Z

Comment I made to Anindyadeep over DM (copying here for visibility):

"""
I think that this pattern makes sense, but the way I was originally thinking of this was a little different; have a ParamSelector class that wraps the model trainer (rather than being embedded in the model trainer)
so you would pass the model trainer into the ParamSelector class, and then the parameter selector will run this trainer on a bunch of different configurations before ultimately returning a single trained model

I feel that this provides a little more modularity, but I'm open to changing my mind if you can convince me that the other pattern is better 🙂
"""

Anindyadeep · 2023-09-04T02:52:37Z

Yeah, @viswavi make sense, will look into that and push another PR with that and pre-recommit all done

In this commit, the main changes include: - Name changes from AutomatedParamSelector to OptunaParamSelector - Adding more functionalities so that we do not have to add inside train_model Additionally all lint checks are passing

…iner However we need to discuss on this commit. This method got useful in param_selector code for accesing the trainer class and doing the hyperparameter search.

This commit adds the following: - Removed extra args of train_model from previous commit - Added a new function called search_best_hyperparameters - Changed the schema of hyperparameters dict

Current tests are not using the new hyperparameters schema, that I proposed in the previous commit. So I tried to change those in this commit. Howeve doing that several tests are failing and currently blocked

Currently this contains the values for the static hyperparameters and the hyperparameter search space. This should be useful when there are lot of tweakable default parameters.

Anindyadeep · 2023-09-04T20:18:33Z

Hey @viswavi, added some more changes from our previous discussion. Currently checks inside precommit is passing. However, for certain reasons and changes, tests are breaking. We might need to do some discussion on this and I can re iterate on the commits.

viswavi

Made a pass. Let's chat more about this implementation, but overall I think this is very nice progress towards having a great parameter selector!

One major thing missing from this, right now, is having unit tests for the new functionality you're proposing. These will take time to right but it will increase the chance that your code will do what you want right away, and it will likely help you find and fix bugs in your implementation.

viswavi · 2023-09-05T01:10:33Z

prompt2model/model_trainer/generate.py

@@ -32,6 +34,7 @@ def __init__(
        executor_batch_size: int = 10,
        tokenizer_max_length: int = 512,
        sequence_max_length: int = 1024,
+        device: Optional[str] = None,


We prefer to use union types, which would set this as:

Suggested change

device: Optional[str] = None,

device: str | None = None,

To support this, you will also have to add
from __future__ import annotations # noqa FI58

In the first line of this file.

prompt2model/model_trainer/generate.py

viswavi · 2023-09-05T01:13:40Z

prompt2model/param_selector/search_with_optuna.py

+# TODO:
+# - User tweaking hyperparameter
+# - Dynamic initialization of hyperparamter range from task type and complexity
+# - Using LLM to suggest hyperparameter


Let's not keep TODOs in the code. Instead, please create issues to track these suggestions 😄

Sure, will be putting issues from the next set of commits

viswavi · 2023-09-05T01:16:25Z

prompt2model/param_selector/search_with_optuna.py

+        # TODO:
+        # - Find the industry standards for default values spec
+        # - More asserts for other keys. Example checking the min or max values


Delete the TODOs and add as issues. Also, we/others can help choose these default values over time.

prompt2model/model_trainer/base.py

viswavi · 2023-09-05T01:37:23Z

prompt2model/param_selector/search_with_optuna.py

+        # prepare the training args
+        # we are assuming here that the user will also provide these args with the additional
+        # args for range. Or we can provide an another argument of search_args (dict) that will
+        # tell the user to provide the arguments for doing hyperparamerter search


Comments should almost always be complete, grammatical sentences (with proper capitalization and punctuation). I also feel that most of this comment is unnecessarily detailed/confusing.

Yeah, currently I am struggling on this part, but will improve on that

viswavi · 2023-09-05T01:37:42Z

prompt2model/param_selector/search_with_optuna.py

+        trainer.args = training_args
+
+        best_run = trainer.hyperparameter_search(
+            n_trials=5,  # FIXME: Discussion needed, where to put this arg and visibility for the user


I think this should be set in the init function for this class.

In the init function we only have the trainer param, so can I then change the definition?

viswavi · 2023-09-05T01:38:28Z

prompt2model/utils/config.py

+# because the more the max value, the more it will take time to get the
+# max batch size
+
+MAX_SUPPORTED_BATCH_SIZE = 128


This is probably too big. I think something like 12 or 16 makes more sense (since we're assuming that folks will be training BERT-sized models on small GPUs).

I am also adding a function that I will put on the next commit, to automatically fix the max batch size. This goes something like this

# this is something we need to know MAX_SUPPORTED_BATCH_SIZE = 128 def get_max_batch_size(model: nn.Module, input_dim: int, device: str): batch_size = MAX_SUPPORTED_BATCH_SIZE while True: try: dummy_tensor = torch.rand(batch_size, input_dim).to(device) _ = model(dummy_tensor) break except RuntimeError as e: if "CUDA out of memory" in str(e): batch_size //= 2 else: raise e return batch_size

I think we can add this in a follow-up PR? For now we can just use a small batch size by default for simplicity.

Yes, I alreay added.

viswavi · 2023-09-05T01:39:39Z

prompt2model/param_selector/search_with_optuna.py

+
+    def __init__(self, trainer: BaseTrainer):
+        """Initialize with train/val datasets and a prompt specification"""
+        self.trainer = trainer


Suggestion: make this

Suggested change

self.trainer = trainer

self.hf_trainer = promptmodel_trainer.trainer

viswavi · 2023-09-05T01:40:27Z

prompt2model/param_selector/search_with_optuna.py

+class OptunaParamSelector(ParamSelector):
+    """Uses optuna for searching for hyperparameters"""
+
+    def __init__(self, trainer: BaseTrainer):


Suggestion: rename this object to promptmodel_trainer to avoid clashing with the huggingface trainer member variable name within BaseTrainer.

Suggested change

def __init__(self, trainer: BaseTrainer):

def __init__(self, promptmodel_trainer: BaseTrainer):

zhaochenyang20 · 2023-09-05T02:55:36Z

@Anindyadeep Thanks so much for your contribution. I am pondering how you can control the training device. I searched against the whole Trainer, but I found that self.device is never used? 🤔

I hope that I was wrong, and I guess we can assign the training device, #317 is fixed.

fix typo Co-authored-by: Vijay Viswanathan <[email protected]>

fix: remove additional comments Co-authored-by: Vijay Viswanathan <[email protected]>

Fix: Clean docstrings and making it more understandable Co-authored-by: Vijay Viswanathan <[email protected]>

fix: small typo checkls Co-authored-by: Vijay Viswanathan <[email protected]>

Fix: Shortened doctrings and removed unnecessary comments Co-authored-by: Vijay Viswanathan <[email protected]>

Anindyadeep · 2023-09-05T04:57:34Z

prompt2model/model_trainer/generate.py

+
+        ```python
+        hyperparameter_choice = {
+            "static_hyperparameters": {
+                "output_dir": "./result",
+                "logging_steps": 1,
+                "save_strategy": "no",
+                "num_train_epochs": 10,
+                "per_device_train_batch_size": 100,
+                "warmup_steps": 0,
+                "weight_decay": 0.01,
+                "logging_dir": "./result",
+                "learning_rate": 1e-4,
+                "evaluation_strategy": "epoch",
+                "test_size": 0.15,
+            },
+
+            "optuna": {
+                "min_num_train_epochs": 5,
+                "max_num_train_epochs": 10,
+                "save_strategy": ["epoch", "steps", "no"],
+                "evaluation_strategy": ["epoch", "no"],
+                "per_device_train_batch_size": [4, 8, 16, 32],
+                "min_weight_decay": 1e-5,
+                "max_weight_decay": 1e-1,
+                "min_learning_rate": 1e-5,
+                "max_learning_rate": 1e-1,
+            },
+        }
+        ```
+        Here all of the keys are optional. Here are the criterions laid out:


I don't think this is entirely accurate, since we're also adding the optuna parameters, which are otherwise not given to the train_model method. Can you clarify this in the comment?

I have mentioned the schema here, User have two choices, they can either go for hyperpartam optimization or they might not. Now here are some of my questions

static_hyperparameters are those which are only meant for model training trainer.train_model().

optuna when this key is given, the user can have two choices to put values default (this means the hp space will be chosen from the default set) or the user can provide the dict where they can provide all the parameters or some selected ones. Now these will be used during the time of optimizing the hyperparams. And the best hyperparameters values will override the existing static_hyperparameters

fix: docstring typo fixed. Co-authored-by: Vijay Viswanathan <[email protected]>

neubig · 2023-09-20T12:09:36Z

Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?

Anindyadeep · 2023-09-20T12:26:09Z

Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?

Yes @neubig, I am working on this right now. However I am blocked for some cases, hence paused the work. But I am going to roll out the first iterations soon.

… strategy

Removed the hard code initialization by replacing with a for loop.

This includes: - Added Path("result/trained_model") as the path to save weights - removed save strategy for train and eval.

prompt2model/param_selector/search_with_optuna.py

viswavi

Requested one other functional change regarding how you're configuring the hyperparameter space for Optuna

viswavi · 2023-10-20T16:30:29Z

prompt2model/param_selector/search_with_optuna.py

+
+        hp_space = {}
+        for key, default_value in DEFAULT_HYPERPARAMETERS_SPACE.items():
+            hp_space[key] = hyperparameter_space.get(key, default_value)


What about the situation where the user provides a key in hyperparameter_space that is not found in DEFAULT_HYPERPARAMETERS_SPACE (e.g. max_grad_norm)?

In the current implementation, this parameter will be ignored. I think there's two ways we can handle this case:

Add this hyperparameter to the hp_space dictionary anyways (I think this makes the most sense)

Log a warning that an unexpected hyperparameter has been provided, then ignore the parameter. This is probably the safest option but will require users to modify the default hyperparameter object if they want to pass in any new hyperparameter.

I mean, I am iterating over DEFAULT_HYPERPARAMETERS_SPACE and in order to use new key in hyperparameter_space, the user has to change the DEFAULT_HYPERPARAMETERS_SPACE. So, is't that way the edge case is automatically handled?

Yes, but this new key in hyperparameter_space will be silently ignored in the current implementation. That may be confusing for a user, who gets no indication that the supplied hyperparameter would be used

Ahh got it, so for any new key the user puts, it will just log a warning right?

Yes, I think that would be sufficient.

Basically, find all the keys in hyperparameter_space which are not found in DEFAULT_HYPERPARAMETERS_SPACE. Tell the user these keys are being ignored currently, and that they can expose these keys to the trainer by adding them to DEFAULT_HYPERPARAMETERS_SPACE

Yes, I can add this.

…optimzation

Anindyadeep · 2023-10-26T13:26:26Z

@viswavi I had to remove the select_from_base function from the base otherwise it is failing the tests.

viswavi · 2023-10-30T16:03:05Z

prompt2model/param_selector/search_with_optuna.py

+                    f"Key {key} is not present in DEFAULT_HYPERPARAMETERS_SPACE. Hence will be ignored",  # noqa
+                    "However, you can expose the key to the Trainer by adding it to DEFAULT_HYPERPARAMETERS_SPACE.",  # noqa


Don't use bare noqa calls, please use noqa E501 if you're suppressing a "line too long" warning.

Suggestion:

f"Key {key} is not present in DEFAULT_HYPERPARAMETERS_SPACE. Hence it will be ignored", # noqa E501 "However, you can expose the key to the Trainer by adding it to DEFAULT_HYPERPARAMETERS_SPACE.", # noqa E501

I've also fixed a grammatical issue in the first line in this suggestion^

viswavi

LGTM!

Anindyadeep · 2023-10-30T17:39:20Z

Thank you so much @viswavi and professor @neubig for mentoring throughout the project. I learned lot on this process, specifically on perfection and structured approaches. Looking forward to make some more PRs on this amazing project.

Next one I would like to go for the CLI issue and try to make it better :)

Thanks once again.

Anindyadeep added 2 commits September 2, 2023 21:01

Add Optuna for automated hyper-parameter search

55767c7

This commits adds new feature for adding hyperparameter search using optuna

neubig requested review from viswavi and neubig September 2, 2023 16:30

viswavi reviewed Sep 3, 2023

View reviewed changes

Anindyadeep added 5 commits September 5, 2023 00:45

Fix: Name of the file and wraping around BaseTrainer class

4ddad91

In this commit, the main changes include: - Name changes from AutomatedParamSelector to OptunaParamSelector - Adding more functionalities so that we do not have to add inside train_model Additionally all lint checks are passing

Add: a trainer property that will return the default hugging face tra…

b02235a

…iner However we need to discuss on this commit. This method got useful in param_selector code for accesing the trainer class and doing the hyperparameter search.

Add: Fixed the train_model and added new functionalities

c597e44

This commit adds the following: - Removed extra args of train_model from previous commit - Added a new function called search_best_hyperparameters - Changed the schema of hyperparameters dict

Add: Starting doing tests for the proposed new schema of hyperparameters

8ba3dfe

Current tests are not using the new hyperparameters schema, that I proposed in the previous commit. So I tried to change those in this commit. Howeve doing that several tests are failing and currently blocked

Add: A config file to keep all the default values.

af6119c

Currently this contains the values for the static hyperparameters and the hyperparameter search space. This should be useful when there are lot of tweakable default parameters.

viswavi requested changes Sep 5, 2023

View reviewed changes

Anindyadeep and others added 5 commits September 5, 2023 10:10

Update prompt2model/model_trainer/base.py

85f5c52

fix typo Co-authored-by: Vijay Viswanathan <[email protected]>

Update prompt2model/model_trainer/base.py

b431dd2

fix: remove additional comments Co-authored-by: Vijay Viswanathan <[email protected]>

Update prompt2model/model_trainer/generate.py

64bcbd7

Fix: Clean docstrings and making it more understandable Co-authored-by: Vijay Viswanathan <[email protected]>

Update prompt2model/model_trainer/base.py

487032a

fix: small typo checkls Co-authored-by: Vijay Viswanathan <[email protected]>

Update prompt2model/model_trainer/base.py

3a84721

Fix: Shortened doctrings and removed unnecessary comments Co-authored-by: Vijay Viswanathan <[email protected]>

Anindyadeep commented Sep 5, 2023

View reviewed changes

Update prompt2model/model_trainer/generate.py

4cb1688

fix: docstring typo fixed. Co-authored-by: Vijay Viswanathan <[email protected]>

Anindyadeep added 3 commits October 20, 2023 18:32

Chore: Removed additional optimization param for batch, save and eval…

5349fb8

… strategy

Chore: Condensed _build_hp_space function.

f415526

Removed the hard code initialization by replacing with a for loop.

Chore: Fix on path and removal of hp params.

778966d

This includes: - Added Path("result/trained_model") as the path to save weights - removed save strategy for train and eval.

viswavi self-requested a review October 20, 2023 16:02

viswavi reviewed Oct 20, 2023

View reviewed changes

prompt2model/param_selector/search_with_optuna.py Outdated Show resolved Hide resolved

viswavi self-requested a review October 20, 2023 16:17

viswavi requested changes Oct 20, 2023

View reviewed changes

Anindyadeep added 9 commits October 26, 2023 13:01

chore: removed comments

2dbe2b0

chore: remove save strategy

d7a72ef

Chore: Removed select_from_spec function

8737d61

chore: added preiod in docstring

d97bf73

chore: typo fix

1ee2ee3

chore: capitalize Optuna

0312a84

fix: removed indexing for train batch size as it is fixed now for hp …

ddb94c1

…optimzation

removed select from spec function which is conflicting during test

812970d

removed save_strategy in trial suggestion

7f32227

Anindyadeep requested a review from viswavi October 26, 2023 14:22

Added warning message for keys not present in DEFAULT_HYPERPARAMS.

2fbb1a0

viswavi reviewed Oct 30, 2023

View reviewed changes

Anindyadeep added 3 commits October 30, 2023 22:05

chore: change from # noqa to # noqa E501

9023679

chore: small typo fix.

b0fd96f

chore: small typo fix.

9fb3a44

viswavi approved these changes Oct 30, 2023

View reviewed changes

Merge branch 'main' into anindya/optuna-integration

3dc0286

viswavi merged commit cc4995a into neulab:main Oct 30, 2023

This was referenced Oct 30, 2023

Fix bugs in Optuna integration with the prompt2model demo script #374

Merged

Create integration tests for our prompt2model_demo.py and .ipynb workflows #375

Open

	class AutomamatedParamSelector:
	class AutomatedParamSelector:

		@@ -0,0 +1,39 @@
		"""This module provides a dummy trainer for testing purposes."""

	self.trainer = trainer
	self.hf_trainer = promptmodel_trainer.trainer

	def __init__(self, trainer: BaseTrainer):
	def __init__(self, promptmodel_trainer: BaseTrainer):

		f"Key {key} is not present in DEFAULT_HYPERPARAMETERS_SPACE. Hence will be ignored", # noqa
		"However, you can expose the key to the Trainer by adding it to DEFAULT_HYPERPARAMETERS_SPACE.", # noqa

Optuna Integration for automated Hyper parameter search #315

Optuna Integration for automated Hyper parameter search #315

Uh oh!

Conversation

Anindyadeep commented Sep 2, 2023

Description

Uh oh!

neubig commented Sep 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Anindyadeep commented Sep 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neubig commented Sep 2, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viswavi commented Sep 3, 2023

Uh oh!

viswavi commented Sep 3, 2023

Uh oh!

viswavi commented Sep 3, 2023

Uh oh!

Anindyadeep commented Sep 4, 2023

Uh oh!

Anindyadeep commented Sep 4, 2023

Uh oh!

viswavi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Anindyadeep Oct 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented Sep 5, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neubig commented Sep 20, 2023

Uh oh!

Anindyadeep commented Sep 20, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viswavi left a comment

Choose a reason for hiding this comment

Uh oh!

neubig commented Sep 2, 2023 •

edited

Loading

Anindyadeep commented Sep 2, 2023 •

edited

Loading

Anindyadeep Oct 13, 2023 •

edited

Loading

viswavi Oct 30, 2023 •

edited

Loading