Skip to content

Update automate-eks-cluster-creation.sh#529

Merged
gmgtgh merged 1 commit intomainfrom
hp-eks-auto-patch
Jan 22, 2025
Merged

Update automate-eks-cluster-creation.sh#529
gmgtgh merged 1 commit intomainfrom
hp-eks-auto-patch

Conversation

@amanshanbhag
Copy link
Copy Markdown
Contributor

Minor bug fix

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@amanshanbhag amanshanbhag requested a review from gmgtgh January 22, 2025 18:34
Copy link
Copy Markdown
Contributor

@gmgtgh gmgtgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gmgtgh gmgtgh merged commit 4783ba7 into main Jan 22, 2025
@gmgtgh gmgtgh deleted the hp-eks-auto-patch branch January 22, 2025 18:36
jianyinglangaws pushed a commit that referenced this pull request Jan 22, 2025
mhuguesaws pushed a commit that referenced this pull request Feb 19, 2025
…script (#530)

* Update the Neuron SDK to 2.21.0

* Update the Llama3-70B pretraining with the Neuron SDK 2.21

* Fix a typo

* Add --hw_backend trn1 in the convert_checkpoint command

* More update

* Update the update_neuron_sdk.sh by removing the neuron-top check

* Keep enable_update_neuron_sdk as Flase by default

* Update automate-eks-cluster-creation.sh (#529)

Minor bug fix

* Update according to the review comments.

* minor updates in doc

---------

Co-authored-by: Aman Shanbhag <55571601+amanshanbhag@users.noreply.github.com>
Co-authored-by: Keita Watanabe <mlkeita@amazon.com>
dongjin-ml pushed a commit to dongjin-ml/awsome-distributed-training that referenced this pull request Feb 20, 2025
dongjin-ml pushed a commit to dongjin-ml/awsome-distributed-training that referenced this pull request Feb 20, 2025
…script (awslabs#530)

* Update the Neuron SDK to 2.21.0

* Update the Llama3-70B pretraining with the Neuron SDK 2.21

* Fix a typo

* Add --hw_backend trn1 in the convert_checkpoint command

* More update

* Update the update_neuron_sdk.sh by removing the neuron-top check

* Keep enable_update_neuron_sdk as Flase by default

* Update automate-eks-cluster-creation.sh (awslabs#529)

Minor bug fix

* Update according to the review comments.

* minor updates in doc

---------

Co-authored-by: Aman Shanbhag <55571601+amanshanbhag@users.noreply.github.com>
Co-authored-by: Keita Watanabe <mlkeita@amazon.com>
KeitaW pushed a commit that referenced this pull request Feb 17, 2026
KeitaW added a commit that referenced this pull request Feb 17, 2026
…script (#530)

* Update the Neuron SDK to 2.21.0

* Update the Llama3-70B pretraining with the Neuron SDK 2.21

* Fix a typo

* Add --hw_backend trn1 in the convert_checkpoint command

* More update

* Update the update_neuron_sdk.sh by removing the neuron-top check

* Keep enable_update_neuron_sdk as Flase by default

* Update automate-eks-cluster-creation.sh (#529)

Minor bug fix

* Update according to the review comments.

* minor updates in doc

---------

Co-authored-by: Aman Shanbhag <55571601+amanshanbhag@users.noreply.github.com>
Co-authored-by: Keita Watanabe <mlkeita@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants