-
Notifications
You must be signed in to change notification settings - Fork 186
Optuna Integration for automated Hyper parameter search #315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commits adds new feature for adding hyperparameter search using optuna
This commit integrates the newly integrated automated hyperparam search through optuna in the prompt2model train_model function. This commit also changes some parts of the training pipeline and introduces a new argument inside train_model() function called `hyperparameter_search_mode` where user can either do not trigger hyperparm search or they can do it through optuna or gpt or a hybrid approach. This commit only targets for optuna.
Wow, thanks for the contribution @Anindyadeep! First, a few initial comments:
|
|
Sounds great, I'll take a look when I have a chance. |
# - Dynamic initialization of hyperparamter range from task type and complexity | ||
# - Using LLM to suggest hyperparameter | ||
|
||
class AutomamatedParamSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in name
class AutomamatedParamSelector: | |
class AutomatedParamSelector: |
@@ -0,0 +1,39 @@ | |||
"""This module provides a dummy trainer for testing purposes.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the docstring to be more accurate. Moreover, I don't think the file name prompt2model/param_selector/generate.py
is descriptive. Maybe instead call it prompt2model/param_selector/optuna_selector.py
?
If you can be more specific about the algorithm used underneath Optuna (e.g. Bayesian optimization) then we can be even more specific in the ffile name.
Hi @Anindyadeep, I made a quick pass through this and generally it looks very good. Thank you for the quick work! I've left 2 minor comments in the PR. After addressing those, can you potentially clean up the code a little bit? from the repo root directory, run After doing this, I will make an in-depth review of the PR. |
Also, it looks like there may be merge conflicts with |
Comment I made to Anindyadeep over DM (copying here for visibility): """ I feel that this provides a little more modularity, but I'm open to changing my mind if you can convince me that the other pattern is better 🙂 |
Yeah, @viswavi make sense, will look into that and push another PR with that and pre-recommit all done |
In this commit, the main changes include: - Name changes from AutomatedParamSelector to OptunaParamSelector - Adding more functionalities so that we do not have to add inside train_model Additionally all lint checks are passing
…iner However we need to discuss on this commit. This method got useful in param_selector code for accesing the trainer class and doing the hyperparameter search.
This commit adds the following: - Removed extra args of train_model from previous commit - Added a new function called search_best_hyperparameters - Changed the schema of hyperparameters dict
Current tests are not using the new hyperparameters schema, that I proposed in the previous commit. So I tried to change those in this commit. Howeve doing that several tests are failing and currently blocked
Currently this contains the values for the static hyperparameters and the hyperparameter search space. This should be useful when there are lot of tweakable default parameters.
Hey @viswavi, added some more changes from our previous discussion. Currently checks inside precommit is passing. However, for certain reasons and changes, tests are breaking. We might need to do some discussion on this and I can re iterate on the commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a pass. Let's chat more about this implementation, but overall I think this is very nice progress towards having a great parameter selector!
One major thing missing from this, right now, is having unit tests for the new functionality you're proposing. These will take time to right but it will increase the chance that your code will do what you want right away, and it will likely help you find and fix bugs in your implementation.
@@ -32,6 +34,7 @@ def __init__( | |||
executor_batch_size: int = 10, | |||
tokenizer_max_length: int = 512, | |||
sequence_max_length: int = 1024, | |||
device: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer to use union types, which would set this as:
device: Optional[str] = None, | |
device: str | None = None, |
To support this, you will also have to add
from __future__ import annotations # noqa FI58
In the first line of this file.
# TODO: | ||
# - User tweaking hyperparameter | ||
# - Dynamic initialization of hyperparamter range from task type and complexity | ||
# - Using LLM to suggest hyperparameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not keep TODOs in the code. Instead, please create issues to track these suggestions 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will be putting issues from the next set of commits
# TODO: | ||
# - Find the industry standards for default values spec | ||
# - More asserts for other keys. Example checking the min or max values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete the TODOs and add as issues. Also, we/others can help choose these default values over time.
# prepare the training args | ||
# we are assuming here that the user will also provide these args with the additional | ||
# args for range. Or we can provide an another argument of search_args (dict) that will | ||
# tell the user to provide the arguments for doing hyperparamerter search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should almost always be complete, grammatical sentences (with proper capitalization and punctuation). I also feel that most of this comment is unnecessarily detailed/confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, currently I am struggling on this part, but will improve on that
trainer.args = training_args | ||
|
||
best_run = trainer.hyperparameter_search( | ||
n_trials=5, # FIXME: Discussion needed, where to put this arg and visibility for the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be set in the init function for this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the init function we only have the trainer
param, so can I then change the definition?
prompt2model/utils/config.py
Outdated
# because the more the max value, the more it will take time to get the | ||
# max batch size | ||
|
||
MAX_SUPPORTED_BATCH_SIZE = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably too big. I think something like 12 or 16 makes more sense (since we're assuming that folks will be training BERT-sized models on small GPUs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also adding a function that I will put on the next commit, to automatically fix the max batch size. This goes something like this
# this is something we need to know
MAX_SUPPORTED_BATCH_SIZE = 128
def get_max_batch_size(model: nn.Module, input_dim: int, device: str):
batch_size = MAX_SUPPORTED_BATCH_SIZE
while True:
try:
dummy_tensor = torch.rand(batch_size, input_dim).to(device)
_ = model(dummy_tensor)
break
except RuntimeError as e:
if "CUDA out of memory" in str(e):
batch_size //= 2
else:
raise e
return batch_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add this in a follow-up PR? For now we can just use a small batch size by default for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I alreay added.
|
||
def __init__(self, trainer: BaseTrainer): | ||
"""Initialize with train/val datasets and a prompt specification""" | ||
self.trainer = trainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: make this
self.trainer = trainer | |
self.hf_trainer = promptmodel_trainer.trainer |
class OptunaParamSelector(ParamSelector): | ||
"""Uses optuna for searching for hyperparameters""" | ||
|
||
def __init__(self, trainer: BaseTrainer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: rename this object to promptmodel_trainer
to avoid clashing with the huggingface trainer member variable name within BaseTrainer.
def __init__(self, trainer: BaseTrainer): | |
def __init__(self, promptmodel_trainer: BaseTrainer): |
@Anindyadeep Thanks so much for your contribution. I am pondering how you can control the training device. I searched against the whole Trainer, but I found that I hope that I was wrong, and I guess we can assign the training device, #317 is fixed. |
fix typo Co-authored-by: Vijay Viswanathan <[email protected]>
fix: remove additional comments Co-authored-by: Vijay Viswanathan <[email protected]>
Fix: Clean docstrings and making it more understandable Co-authored-by: Vijay Viswanathan <[email protected]>
fix: small typo checkls Co-authored-by: Vijay Viswanathan <[email protected]>
Fix: Shortened doctrings and removed unnecessary comments Co-authored-by: Vijay Viswanathan <[email protected]>
|
||
```python | ||
hyperparameter_choice = { | ||
"static_hyperparameters": { | ||
"output_dir": "./result", | ||
"logging_steps": 1, | ||
"save_strategy": "no", | ||
"num_train_epochs": 10, | ||
"per_device_train_batch_size": 100, | ||
"warmup_steps": 0, | ||
"weight_decay": 0.01, | ||
"logging_dir": "./result", | ||
"learning_rate": 1e-4, | ||
"evaluation_strategy": "epoch", | ||
"test_size": 0.15, | ||
}, | ||
|
||
"optuna": { | ||
"min_num_train_epochs": 5, | ||
"max_num_train_epochs": 10, | ||
"save_strategy": ["epoch", "steps", "no"], | ||
"evaluation_strategy": ["epoch", "no"], | ||
"per_device_train_batch_size": [4, 8, 16, 32], | ||
"min_weight_decay": 1e-5, | ||
"max_weight_decay": 1e-1, | ||
"min_learning_rate": 1e-5, | ||
"max_learning_rate": 1e-1, | ||
}, | ||
} | ||
``` | ||
Here all of the keys are optional. Here are the criterions laid out: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is entirely accurate, since we're also adding the
optuna
parameters, which are otherwise not given to thetrain_model
method. Can you clarify this in the comment?
I have mentioned the schema here, User have two choices, they can either go for hyperpartam optimization or they might not. Now here are some of my questions
static_hyperparameters
are those which are only meant for model training trainer.train_model()
.
optuna
when this key is given, the user can have two choices to put values default
(this means the hp space will be chosen from the default set) or the user can provide the dict where they can provide all the parameters or some selected ones. Now these will be used during the time of optimizing the hyperparams. And the best hyperparameters values will override the existing static_hyperparameters
fix: docstring typo fixed. Co-authored-by: Vijay Viswanathan <[email protected]>
Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this? |
Yes @neubig, I am working on this right now. However I am blocked for some cases, hence paused the work. But I am going to roll out the first iterations soon. |
Removed the hard code initialization by replacing with a for loop.
This includes: - Added Path("result/trained_model") as the path to save weights - removed save strategy for train and eval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requested one other functional change regarding how you're configuring the hyperparameter space for Optuna
|
||
hp_space = {} | ||
for key, default_value in DEFAULT_HYPERPARAMETERS_SPACE.items(): | ||
hp_space[key] = hyperparameter_space.get(key, default_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the situation where the user provides a key in hyperparameter_space
that is not found in DEFAULT_HYPERPARAMETERS_SPACE
(e.g. max_grad_norm
)?
In the current implementation, this parameter will be ignored. I think there's two ways we can handle this case:
- Add this hyperparameter to the
hp_space
dictionary anyways (I think this makes the most sense) - Log a warning that an unexpected hyperparameter has been provided, then ignore the parameter. This is probably the safest option but will require users to modify the default hyperparameter object if they want to pass in any new hyperparameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I am iterating over DEFAULT_HYPERPARAMETERS_SPACE
and in order to use new key in hyperparameter_space
, the user has to change the DEFAULT_HYPERPARAMETERS_SPACE
. So, is't that way the edge case is automatically handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this new key in hyperparameter_space
will be silently ignored in the current implementation. That may be confusing for a user, who gets no indication that the supplied hyperparameter would be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh got it, so for any new key the user puts, it will just log a warning right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that would be sufficient.
Basically, find all the keys in hyperparameter_space
which are not found in DEFAULT_HYPERPARAMETERS_SPACE
. Tell the user these keys are being ignored currently, and that they can expose these keys to the trainer by adding them to DEFAULT_HYPERPARAMETERS_SPACE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can add this.
@viswavi I had to remove the |
f"Key {key} is not present in DEFAULT_HYPERPARAMETERS_SPACE. Hence will be ignored", # noqa | ||
"However, you can expose the key to the Trainer by adding it to DEFAULT_HYPERPARAMETERS_SPACE.", # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use bare noqa
calls, please use noqa E501
if you're suppressing a "line too long" warning.
Suggestion:
f"Key {key} is not present in DEFAULT_HYPERPARAMETERS_SPACE. Hence it will be ignored", # noqa E501
"However, you can expose the key to the Trainer by adding it to DEFAULT_HYPERPARAMETERS_SPACE.", # noqa E501
I've also fixed a grammatical issue in the first line in this suggestion^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thank you so much @viswavi and professor @neubig for mentoring throughout the project. I learned lot on this process, specifically on perfection and structured approaches. Looking forward to make some more PRs on this amazing project. Next one I would like to go for the CLI issue and try to make it better :) Thanks once again. |
Description
This PR targets to add a new feature of automated hyperparameter search using Optuna. Additionally, it also introduces a new spec for doing a hyperparameter search using three ways.
This PR solves issue #313
This is how the
train_model()
function changes from the client's sideSome additional changes like supporting default hyperparameters as an option is also provided. However that is something needs to be discussed upon.