Skip to content

Optuna Integration for automated Hyper parameter search #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Oct 30, 2023

Conversation

Anindyadeep
Copy link
Contributor

Description

This PR targets to add a new feature of automated hyperparameter search using Optuna. Additionally, it also introduces a new spec for doing a hyperparameter search using three ways.

  • Hyperparameter search using Optuna
  • Hyperparameter search using GPT API using the TaskType and complex prompting
  • Hybrid search using GPT and Optuna

This PR solves issue #313

This is how the train_model() function changes from the client's side

from prompt2model.model_trainer import GenerationModelTrainer
from pathlib import Path

trainer = GenerationModelTrainer(
    pre_train_model_name,
    has_encoder=True,
    executor_batch_size=8,
    tokenizer_max_length=1024,
    sequence_max_length=1280,
    device="CPU"
)

args_output_root = Path("result/training_output")
args_output_root.mkdir(parents=True, exist_ok=True)

trained_model, trained_tokenizer = trainer.train_model(
    training_datasets=train_datasets,
    validation_datasets=val_datasets,
    hyperparameter_search_mode="optuna"
)

Some additional changes like supporting default hyperparameters as an option is also provided. However that is something needs to be discussed upon.

This commits adds new feature for adding hyperparameter search using optuna
This commit integrates the newly integrated automated hyperparam search
through optuna in the prompt2model train_model function.

This commit also changes some parts of the training pipeline and introduces
a new argument inside train_model() function called `hyperparameter_search_mode`
where user can either do not trigger hyperparm search or they can do it through
optuna or gpt or a hybrid approach. This commit only targets for optuna.
@neubig neubig requested review from viswavi and neubig September 2, 2023 16:30
@neubig
Copy link
Collaborator

neubig commented Sep 2, 2023

Wow, thanks for the contribution @Anindyadeep!

First, a few initial comments:

  1. You listed three different things that could be done for hyperparameter search. I would definitely suggest that we split those into three separate PRs (for ease of reviewing). So we can just review the optuna hyperparameter search in this PR.
  2. Just to clarify, would you like us to start reviewing this now, or is it still WIP?
  3. It seems that this is not passing formatting checks. I would suggest that you run pre-commit checks, as detailed here.

@Anindyadeep
Copy link
Contributor Author

Anindyadeep commented Sep 2, 2023

Wow, thanks for the contribution @Anindyadeep!

First, a few initial comments:

  1. You listed three different things that could be done for hyperparameter search. I would definitely suggest that we split those into three separate PRs (for ease of reviewing). So we can just review the optuna hyperparameter search in this PR.
  2. Just to clarify, would you like us to start reviewing this now, or is it still WIP?
  3. It seems that this is not passing formatting checks. I would suggest that you run pre-commit checks, as detailed here.
  • Yes, I intend to split that into three different PRs, I just have mentioned it here, will remove that Todo eventually.

  • I also mentioned @viswavi (as he is been mentoring me in this) to start a quick review and would appreciate it if you also provide a high-level review of this. (However the work is still in progress, I have to test the working of the hyperparameter search since, my device does not have that much GPU)

  • As I have mentioned, currently I have a gap in making changes and testing those, so I just wanted to add a PR to get your views as well as run the branch inside colab. We can make this a draft PR for now. However, please do provide your thoughts on the changes I am making eventially.

@neubig
Copy link
Collaborator

neubig commented Sep 2, 2023

Sounds great, I'll take a look when I have a chance.

# - Dynamic initialization of hyperparamter range from task type and complexity
# - Using LLM to suggest hyperparameter

class AutomamatedParamSelector:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in name

Suggested change
class AutomamatedParamSelector:
class AutomatedParamSelector:

@@ -0,0 +1,39 @@
"""This module provides a dummy trainer for testing purposes."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the docstring to be more accurate. Moreover, I don't think the file name prompt2model/param_selector/generate.py is descriptive. Maybe instead call it prompt2model/param_selector/optuna_selector.py?

If you can be more specific about the algorithm used underneath Optuna (e.g. Bayesian optimization) then we can be even more specific in the ffile name.

@viswavi
Copy link
Collaborator

viswavi commented Sep 3, 2023

Hi @Anindyadeep, I made a quick pass through this and generally it looks very good. Thank you for the quick work! I've left 2 minor comments in the PR. After addressing those, can you potentially clean up the code a little bit?

from the repo root directory, run pre-commit run --all-files
and also run pytest to make sure this change has not broken any other tests.

After doing this, I will make an in-depth review of the PR.

@viswavi
Copy link
Collaborator

viswavi commented Sep 3, 2023

Also, it looks like there may be merge conflicts with neulab:main

@viswavi
Copy link
Collaborator

viswavi commented Sep 3, 2023

Comment I made to Anindyadeep over DM (copying here for visibility):

"""
I think that this pattern makes sense, but the way I was originally thinking of this was a little different; have a ParamSelector class that wraps the model trainer (rather than being embedded in the model trainer)
so you would pass the model trainer into the ParamSelector class, and then the parameter selector will run this trainer on a bunch of different configurations before ultimately returning a single trained model

I feel that this provides a little more modularity, but I'm open to changing my mind if you can convince me that the other pattern is better 🙂
"""

@Anindyadeep
Copy link
Contributor Author

Yeah, @viswavi make sense, will look into that and push another PR with that and pre-recommit all done

In this commit, the main changes include:
- Name changes from AutomatedParamSelector to OptunaParamSelector
- Adding more functionalities so that we do not have to add inside
train_model

Additionally all lint checks are passing
…iner

However we need to discuss on this commit. This method got useful in
param_selector code for accesing the trainer class and doing the
hyperparameter search.
This commit adds the following:
- Removed extra args of train_model from previous commit
- Added a new function called search_best_hyperparameters
- Changed the schema of hyperparameters dict
Current tests are not using the new hyperparameters schema, that I proposed
in the previous commit. So I tried to change those in this commit.
Howeve doing that several tests are failing and currently blocked
Currently this contains the values for the static hyperparameters
and the hyperparameter search space. This should be useful when there
are lot of tweakable default parameters.
@Anindyadeep
Copy link
Contributor Author

Hey @viswavi, added some more changes from our previous discussion. Currently checks inside precommit is passing. However, for certain reasons and changes, tests are breaking. We might need to do some discussion on this and I can re iterate on the commits.

Copy link
Collaborator

@viswavi viswavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a pass. Let's chat more about this implementation, but overall I think this is very nice progress towards having a great parameter selector!

One major thing missing from this, right now, is having unit tests for the new functionality you're proposing. These will take time to right but it will increase the chance that your code will do what you want right away, and it will likely help you find and fix bugs in your implementation.

@@ -32,6 +34,7 @@ def __init__(
executor_batch_size: int = 10,
tokenizer_max_length: int = 512,
sequence_max_length: int = 1024,
device: Optional[str] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer to use union types, which would set this as:

Suggested change
device: Optional[str] = None,
device: str | None = None,

To support this, you will also have to add
from __future__ import annotations # noqa FI58

In the first line of this file.

Comment on lines 18 to 21
# TODO:
# - User tweaking hyperparameter
# - Dynamic initialization of hyperparamter range from task type and complexity
# - Using LLM to suggest hyperparameter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not keep TODOs in the code. Instead, please create issues to track these suggestions 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will be putting issues from the next set of commits

Comment on lines 81 to 83
# TODO:
# - Find the industry standards for default values spec
# - More asserts for other keys. Example checking the min or max values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete the TODOs and add as issues. Also, we/others can help choose these default values over time.

Comment on lines 134 to 137
# prepare the training args
# we are assuming here that the user will also provide these args with the additional
# args for range. Or we can provide an another argument of search_args (dict) that will
# tell the user to provide the arguments for doing hyperparamerter search
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should almost always be complete, grammatical sentences (with proper capitalization and punctuation). I also feel that most of this comment is unnecessarily detailed/confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, currently I am struggling on this part, but will improve on that

trainer.args = training_args

best_run = trainer.hyperparameter_search(
n_trials=5, # FIXME: Discussion needed, where to put this arg and visibility for the user
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be set in the init function for this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the init function we only have the trainer param, so can I then change the definition?

# because the more the max value, the more it will take time to get the
# max batch size

MAX_SUPPORTED_BATCH_SIZE = 128
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably too big. I think something like 12 or 16 makes more sense (since we're assuming that folks will be training BERT-sized models on small GPUs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also adding a function that I will put on the next commit, to automatically fix the max batch size. This goes something like this

# this is something we need to know
MAX_SUPPORTED_BATCH_SIZE = 128


def get_max_batch_size(model: nn.Module, input_dim: int, device: str):
    batch_size = MAX_SUPPORTED_BATCH_SIZE

    while True:
        try:
            dummy_tensor = torch.rand(batch_size, input_dim).to(device)
            _ = model(dummy_tensor)
            break
        except RuntimeError as e:
            if "CUDA out of memory" in str(e):
                batch_size //= 2
            else:
                raise e
    return batch_size

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add this in a follow-up PR? For now we can just use a small batch size by default for simplicity.

Copy link
Contributor Author

@Anindyadeep Anindyadeep Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I alreay added.


def __init__(self, trainer: BaseTrainer):
"""Initialize with train/val datasets and a prompt specification"""
self.trainer = trainer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: make this

Suggested change
self.trainer = trainer
self.hf_trainer = promptmodel_trainer.trainer

class OptunaParamSelector(ParamSelector):
"""Uses optuna for searching for hyperparameters"""

def __init__(self, trainer: BaseTrainer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: rename this object to promptmodel_trainer to avoid clashing with the huggingface trainer member variable name within BaseTrainer.

Suggested change
def __init__(self, trainer: BaseTrainer):
def __init__(self, promptmodel_trainer: BaseTrainer):

@zhaochenyang20
Copy link
Collaborator

@Anindyadeep Thanks so much for your contribution. I am pondering how you can control the training device. I searched against the whole Trainer, but I found that self.device is never used? 🤔

I hope that I was wrong, and I guess we can assign the training device, #317 is fixed.

Anindyadeep and others added 5 commits September 5, 2023 10:10
fix typo

Co-authored-by: Vijay Viswanathan <[email protected]>
fix: remove additional comments

Co-authored-by: Vijay Viswanathan <[email protected]>
Fix: Clean docstrings and making it more understandable

Co-authored-by: Vijay Viswanathan <[email protected]>
fix: small typo checkls

Co-authored-by: Vijay Viswanathan <[email protected]>
Fix: Shortened doctrings and removed unnecessary comments

Co-authored-by: Vijay Viswanathan <[email protected]>
Comment on lines 261 to 291

```python
hyperparameter_choice = {
"static_hyperparameters": {
"output_dir": "./result",
"logging_steps": 1,
"save_strategy": "no",
"num_train_epochs": 10,
"per_device_train_batch_size": 100,
"warmup_steps": 0,
"weight_decay": 0.01,
"logging_dir": "./result",
"learning_rate": 1e-4,
"evaluation_strategy": "epoch",
"test_size": 0.15,
},

"optuna": {
"min_num_train_epochs": 5,
"max_num_train_epochs": 10,
"save_strategy": ["epoch", "steps", "no"],
"evaluation_strategy": ["epoch", "no"],
"per_device_train_batch_size": [4, 8, 16, 32],
"min_weight_decay": 1e-5,
"max_weight_decay": 1e-1,
"min_learning_rate": 1e-5,
"max_learning_rate": 1e-1,
},
}
```
Here all of the keys are optional. Here are the criterions laid out:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is entirely accurate, since we're also adding the optuna parameters, which are otherwise not given to the train_model method. Can you clarify this in the comment?

I have mentioned the schema here, User have two choices, they can either go for hyperpartam optimization or they might not. Now here are some of my questions

static_hyperparameters are those which are only meant for model training trainer.train_model().

optuna when this key is given, the user can have two choices to put values default (this means the hp space will be chosen from the default set) or the user can provide the dict where they can provide all the parameters or some selected ones. Now these will be used during the time of optimizing the hyperparams. And the best hyperparameters values will override the existing static_hyperparameters

fix: docstring typo fixed.

Co-authored-by: Vijay Viswanathan <[email protected]>
@neubig
Copy link
Collaborator

neubig commented Sep 20, 2023

Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?

@Anindyadeep
Copy link
Contributor Author

Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?

Yes @neubig, I am working on this right now. However I am blocked for some cases, hence paused the work. But I am going to roll out the first iterations soon.

Removed the hard code initialization by replacing with a for loop.
This includes:

- Added Path("result/trained_model") as the path to save weights
- removed save strategy for train and eval.
@viswavi viswavi self-requested a review October 20, 2023 16:02
@viswavi viswavi self-requested a review October 20, 2023 16:17
Copy link
Collaborator

@viswavi viswavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested one other functional change regarding how you're configuring the hyperparameter space for Optuna


hp_space = {}
for key, default_value in DEFAULT_HYPERPARAMETERS_SPACE.items():
hp_space[key] = hyperparameter_space.get(key, default_value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the situation where the user provides a key in hyperparameter_space that is not found in DEFAULT_HYPERPARAMETERS_SPACE (e.g. max_grad_norm)?

In the current implementation, this parameter will be ignored. I think there's two ways we can handle this case:

  1. Add this hyperparameter to the hp_space dictionary anyways (I think this makes the most sense)
  2. Log a warning that an unexpected hyperparameter has been provided, then ignore the parameter. This is probably the safest option but will require users to modify the default hyperparameter object if they want to pass in any new hyperparameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I am iterating over DEFAULT_HYPERPARAMETERS_SPACE and in order to use new key in hyperparameter_space, the user has to change the DEFAULT_HYPERPARAMETERS_SPACE. So, is't that way the edge case is automatically handled?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this new key in hyperparameter_space will be silently ignored in the current implementation. That may be confusing for a user, who gets no indication that the supplied hyperparameter would be used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh got it, so for any new key the user puts, it will just log a warning right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be sufficient.

Basically, find all the keys in hyperparameter_space which are not found in DEFAULT_HYPERPARAMETERS_SPACE. Tell the user these keys are being ignored currently, and that they can expose these keys to the trainer by adding them to DEFAULT_HYPERPARAMETERS_SPACE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can add this.

@Anindyadeep
Copy link
Contributor Author

@viswavi I had to remove the select_from_base function from the base otherwise it is failing the tests.

@Anindyadeep Anindyadeep requested a review from viswavi October 26, 2023 14:22
Comment on lines 181 to 182
f"Key {key} is not present in DEFAULT_HYPERPARAMETERS_SPACE. Hence will be ignored", # noqa
"However, you can expose the key to the Trainer by adding it to DEFAULT_HYPERPARAMETERS_SPACE.", # noqa
Copy link
Collaborator

@viswavi viswavi Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use bare noqa calls, please use noqa E501 if you're suppressing a "line too long" warning.

Suggestion:

                    f"Key {key} is not present in DEFAULT_HYPERPARAMETERS_SPACE. Hence it will be ignored",  # noqa E501
                    "However, you can expose the key to the Trainer by adding it to DEFAULT_HYPERPARAMETERS_SPACE.",  # noqa E501

I've also fixed a grammatical issue in the first line in this suggestion^

Copy link
Collaborator

@viswavi viswavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@viswavi viswavi merged commit cc4995a into neulab:main Oct 30, 2023
@Anindyadeep
Copy link
Contributor Author

Thank you so much @viswavi and professor @neubig for mentoring throughout the project. I learned lot on this process, specifically on perfection and structured approaches. Looking forward to make some more PRs on this amazing project.

Next one I would like to go for the CLI issue and try to make it better :)

Thanks once again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants