Skip to content

Add configs and adapt exporter for RSL-RL distillation #2182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Apr 10, 2025

Conversation

ClemensSchwarke
Copy link
Collaborator

@ClemensSchwarke ClemensSchwarke commented Mar 28, 2025

Description

This PR adds configuration classes for Student-Teacher Distillation and adapts the policy exporters to be able to export student policies.

Type of change

  • Non-breaking change

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Base automatically changed from feature/rsl-rl-multi-gpu to main March 31, 2025 12:59
@Mayankm96 Mayankm96 changed the title Add configs and adapt exporter for distillation Add configs and adapt exporter for RSl-RL distillation Mar 31, 2025
@Mayankm96 Mayankm96 changed the title Add configs and adapt exporter for RSl-RL distillation Add configs and adapt exporter for RSL-RL distillation Mar 31, 2025
@alessandroassirelli98
Copy link
Contributor

Hey! Thanks a lot for this feature — I actually needed something like this, and you already had it covered. Much appreciated!

I had a quick question: how do you think to integrate the teacher-student setup within the broader IsaacLab framework?

At the moment, I’ve defined two separate environments — one for the teacher and one for the student. The only real difference between them is the observation space.
For the agents, I’ve set up two configurations:

  • The teacher uses RslRlPpoActorCriticCfg for the policy and RslRlPpoAlgorithmCfg for the algorithm.

  • The student uses RslRlDistillationStudentTeacherCfg for the policy and RslRlDistillationAlgorithmCfg for the algorithm.

Does this align with your intended setup?

When using the student though, I always need to initialize the teacher first. So I was thinking of adding a teacher_experiment_name attribute to the RslRlOnPolicyRunnerCfg class. That way, in the training script, we could check if the algorithm is a distillation type and load the corresponding teacher checkpoint accordingly.

This would be different from --resume as resume would try to load the checkpoint for the current experiment which would be just the student

env config:

@configclass
class G1FlatGaitRewardTeacherCfg(G1FlatCfg):
    observations: TeacherObservationsCfg = TeacherObservationsCfg()
    rewards: G1GaitRewards = G1GaitRewards()
    commands: CommandsCfg = CommandsCfg()

    def __post_init__(self):
        # post init of parent
        super().__post_init__()
        self.rewards.feet_air_time = None
        self.rewards.track_lin_vel_xy_exp.params["command_name"] = (
            "gaited_base_velocity"
        )
        self.rewards.track_ang_vel_z_exp.params["command_name"] = "gaited_base_velocity"
        # observation terms (order preserved)

@configclass
class G1FlatGaitRewardStudentCfg(G1FlatGaitRewardTeacherCfg):
    observations: StudentObservationsCfg = StudentObservationsCfg()

    def __post_init__(self):
        # post init of parent
        super().__post_init__()

agent config:

@configclass
class G1FlatStudentGaitRewardPPORunnerCfg(G1RoughPPORunnerCfg):
    def __post_init__(self):
        super().__post_init__()
        self.num_steps_per_env = 64
        self.max_iterations = 300
        self.experiment_name = "g1_flat_gait_student"
        self.teacher_experiment_name = "g1_flat_gait_teacher"
        self.policy = RslRlDistillationStudentTeacherCfg(
            init_noise_std= 0.001,
            student_hidden_dims = [256, 128, 128],
            teacher_hidden_dims = [256, 128, 128],
            activation="elu"
        )
        self.algorithm = RslRlDistillationAlgorithmCfg(
            num_learning_epochs=5,
            learning_rate=1e-03,
            gradient_length=2.
        )

@configclass
class G1FlatTeacherGaitRewardPPORunnerCfg(G1RoughPPORunnerCfg):
    def __post_init__(self):
        super().__post_init__()

        self.max_iterations = 30000
        self.experiment_name = "g1_flat_gait_teacher"
        self.policy.actor_hidden_dims = [256, 128, 128]
        self.policy.critic_hidden_dims = [256, 128, 128]

then in the main script adding something like:

    if isinstance(agent_cfg.policy, RslRlDistillationStudentTeacherCfg):
        teacher_root_path =  os.path.join("logs", "rsl_rl", agent_cfg.teacher_experiment_name)
        teacher_root_path = os.path.abspath(teacher_root_path)
        trained_teacher_path = get_checkpoint_path(teacher_root_path, agent_cfg.load_run_teacher, agent_cfg.load_checkpoint_teacher)

Let me know what you think!

@ClemensSchwarke
Copy link
Collaborator Author

Hey! Thanks a lot for this feature — I actually needed something like this, and you already had it covered. Much appreciated!
...

Does this align with your intended setup?

When using the student though, I always need to initialize the teacher first. So I was thinking of adding a teacher_experiment_name attribute to the RslRlOnPolicyRunnerCfg class. That way, in the training script, we could check if the algorithm is a distillation type and load the corresponding teacher checkpoint accordingly.
...

Let me know what you think!

Hey! Yes, your setup looks good :) However, I don't think you need an attribute for the teacher. Just pass the directory with the load_run flag when starting the training. PS: You also don't need to pass the resume flag for training your student.

"""The learning rate for the student policy."""

gradient_length: int = MISSING
"""The number of environment steps the gradient flows back."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ClemensSchwarke should the parameter by default be 1?

Suggested change
"""The number of environment steps the gradient flows back."""
"""The number of rollout steps for gradient propagation.
This is useful for sequential training of recurrent student network.
"""

Copy link
Contributor

@Mayankm96 Mayankm96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also update the extension.toml and CHANGELOG?

Thank you! Looks good otherwise.

@Mayankm96 Mayankm96 merged commit 477b6a9 into main Apr 10, 2025
3 of 4 checks passed
@Mayankm96 Mayankm96 deleted the feature/rsl_rl_2_3_0_adaptation branch April 10, 2025 10:11
yrh012 pushed a commit to aica-technology/isaac-lab that referenced this pull request Jun 16, 2025
# Description

This PR adds configuration classes for Student-Teacher Distillation and
adapts the policy exporters to be able to export student policies.

## Type of change

- Non-breaking change

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Signed-off-by: Mayank Mittal <[email protected]>
Co-authored-by: Mayank Mittal <[email protected]>
Sanjay1911 pushed a commit to Sanjay1911/IsaacLab that referenced this pull request Jun 19, 2025
# Description

This PR adds configuration classes for Student-Teacher Distillation and
adapts the policy exporters to be able to export student policies.

## Type of change

- Non-breaking change

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Signed-off-by: Mayank Mittal <[email protected]>
Co-authored-by: Mayank Mittal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants