Add configs and adapt exporter for RSL-RL distillation #2182

ClemensSchwarke · 2025-03-28T14:43:35Z

Description

This PR adds configuration classes for Student-Teacher Distillation and adapts the policy exporters to be able to export student policies.

Type of change

Non-breaking change

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

source/isaaclab_tasks/isaaclab_tasks/utils/parse_cfg.py

source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py

scripts/reinforcement_learning/rsl_rl/cli_args.py

Signed-off-by: Mayank Mittal <[email protected]>

source/isaaclab_rl/isaaclab_rl/rsl_rl/exporter.py

alessandroassirelli98 · 2025-04-09T08:18:47Z

Hey! Thanks a lot for this feature — I actually needed something like this, and you already had it covered. Much appreciated!

I had a quick question: how do you think to integrate the teacher-student setup within the broader IsaacLab framework?

At the moment, I’ve defined two separate environments — one for the teacher and one for the student. The only real difference between them is the observation space.
For the agents, I’ve set up two configurations:

The teacher uses RslRlPpoActorCriticCfg for the policy and RslRlPpoAlgorithmCfg for the algorithm.
The student uses RslRlDistillationStudentTeacherCfg for the policy and RslRlDistillationAlgorithmCfg for the algorithm.

Does this align with your intended setup?

When using the student though, I always need to initialize the teacher first. So I was thinking of adding a teacher_experiment_name attribute to the RslRlOnPolicyRunnerCfg class. That way, in the training script, we could check if the algorithm is a distillation type and load the corresponding teacher checkpoint accordingly.

This would be different from --resume as resume would try to load the checkpoint for the current experiment which would be just the student

env config:

@configclass
class G1FlatGaitRewardTeacherCfg(G1FlatCfg):
    observations: TeacherObservationsCfg = TeacherObservationsCfg()
    rewards: G1GaitRewards = G1GaitRewards()
    commands: CommandsCfg = CommandsCfg()

    def __post_init__(self):
        # post init of parent
        super().__post_init__()
        self.rewards.feet_air_time = None
        self.rewards.track_lin_vel_xy_exp.params["command_name"] = (
            "gaited_base_velocity"
        )
        self.rewards.track_ang_vel_z_exp.params["command_name"] = "gaited_base_velocity"
        # observation terms (order preserved)

@configclass
class G1FlatGaitRewardStudentCfg(G1FlatGaitRewardTeacherCfg):
    observations: StudentObservationsCfg = StudentObservationsCfg()

    def __post_init__(self):
        # post init of parent
        super().__post_init__()

agent config:

@configclass
class G1FlatStudentGaitRewardPPORunnerCfg(G1RoughPPORunnerCfg):
    def __post_init__(self):
        super().__post_init__()
        self.num_steps_per_env = 64
        self.max_iterations = 300
        self.experiment_name = "g1_flat_gait_student"
        self.teacher_experiment_name = "g1_flat_gait_teacher"
        self.policy = RslRlDistillationStudentTeacherCfg(
            init_noise_std= 0.001,
            student_hidden_dims = [256, 128, 128],
            teacher_hidden_dims = [256, 128, 128],
            activation="elu"
        )
        self.algorithm = RslRlDistillationAlgorithmCfg(
            num_learning_epochs=5,
            learning_rate=1e-03,
            gradient_length=2.
        )

@configclass
class G1FlatTeacherGaitRewardPPORunnerCfg(G1RoughPPORunnerCfg):
    def __post_init__(self):
        super().__post_init__()

        self.max_iterations = 30000
        self.experiment_name = "g1_flat_gait_teacher"
        self.policy.actor_hidden_dims = [256, 128, 128]
        self.policy.critic_hidden_dims = [256, 128, 128]

then in the main script adding something like:

    if isinstance(agent_cfg.policy, RslRlDistillationStudentTeacherCfg):
        teacher_root_path =  os.path.join("logs", "rsl_rl", agent_cfg.teacher_experiment_name)
        teacher_root_path = os.path.abspath(teacher_root_path)
        trained_teacher_path = get_checkpoint_path(teacher_root_path, agent_cfg.load_run_teacher, agent_cfg.load_checkpoint_teacher)

Let me know what you think!

This reverts commit 281c71f.

ClemensSchwarke · 2025-04-09T16:53:12Z

Hey! Thanks a lot for this feature — I actually needed something like this, and you already had it covered. Much appreciated!
...

Does this align with your intended setup?

When using the student though, I always need to initialize the teacher first. So I was thinking of adding a teacher_experiment_name attribute to the RslRlOnPolicyRunnerCfg class. That way, in the training script, we could check if the algorithm is a distillation type and load the corresponding teacher checkpoint accordingly.
...

Let me know what you think!

Hey! Yes, your setup looks good :) However, I don't think you need an attribute for the teacher. Just pass the directory with the load_run flag when starting the training. PS: You also don't need to pass the resume flag for training your student.

source/isaaclab_rl/isaaclab_rl/rsl_rl/distillation_cfg.py

Signed-off-by: Mayank Mittal <[email protected]>

Mayankm96 · 2025-04-10T09:17:21Z

source/isaaclab_rl/isaaclab_rl/rsl_rl/distillation_cfg.py

+    """The learning rate for the student policy."""
+
+    gradient_length: int = MISSING
+    """The number of environment steps the gradient flows back."""


@ClemensSchwarke should the parameter by default be 1?

Suggested change

"""The number of environment steps the gradient flows back."""

"""The number of rollout steps for gradient propagation.

This is useful for sequential training of recurrent student network.

"""

Mayankm96

Can you please also update the extension.toml and CHANGELOG?

Thank you! Looks good otherwise.

Signed-off-by: Mayank Mittal <[email protected]>

# Description This PR adds configuration classes for Student-Teacher Distillation and adapts the policy exporters to be able to export student policies. ## Type of change - Non-breaking change ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: Mayank Mittal <[email protected]> Co-authored-by: Mayank Mittal <[email protected]>

Mayankm96 and others added 17 commits March 21, 2025 20:50

sets device to local rank

0a0450d

adds rsl-rl multi-gpu to docs

c106c32

fixes for rsl-rl library

d0180dc

hard fixes to version of rsl-rl

b5a634f

adds other cfgs for rnd and symmetry

762d496

runs formatter

efe635f

updates changelog

da06bd5

runs formatter

67f3c11

fixes expectation of symm function

71a8fe5

adds more description on symmetry

67be285

adds more docs

485e1a0

updates benchmark script as well

9c3d9d4

updates feature table

83a59fc

adds version check

99d89ea

add configs and adapt exporter for distillation

b744d6a

remove resume argument for better distillation workflow

281c71f

add me to contributors

2671598

ClemensSchwarke requested review from jsmith-bdai, kellyguo11, Mayankm96 and jtigue-bdai as code owners March 28, 2025 14:43

ClemensSchwarke removed request for kellyguo11, jsmith-bdai and jtigue-bdai March 28, 2025 14:43

Mayankm96 reviewed Mar 29, 2025

View reviewed changes

source/isaaclab_tasks/isaaclab_tasks/utils/parse_cfg.py Outdated Show resolved Hide resolved

Mayankm96 reviewed Mar 29, 2025

View reviewed changes

source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py Outdated Show resolved Hide resolved

Mayankm96 reviewed Mar 29, 2025

View reviewed changes

source/isaaclab_rl/isaaclab_rl/rsl_rl/rl_cfg.py Show resolved Hide resolved

Mayankm96 reviewed Mar 29, 2025

View reviewed changes

scripts/reinforcement_learning/rsl_rl/cli_args.py Outdated Show resolved Hide resolved

Base automatically changed from feature/rsl-rl-multi-gpu to main March 31, 2025 12:59

Merge branch 'main' into feature/rsl_rl_2_3_0_adaptation

9d14bee

Signed-off-by: Mayank Mittal <[email protected]>

Mayankm96 changed the title ~~Add configs and adapt exporter for distillation~~ Add configs and adapt exporter for RSl-RL distillation Mar 31, 2025

Mayankm96 changed the title ~~Add configs and adapt exporter for RSl-RL distillation~~ Add configs and adapt exporter for RSL-RL distillation Mar 31, 2025

Mayankm96 reviewed Apr 9, 2025

View reviewed changes

source/isaaclab_rl/isaaclab_rl/rsl_rl/exporter.py Outdated Show resolved Hide resolved

ClemensSchwarke added 4 commits April 9, 2025 16:45

Revert "remove resume argument for better distillation workflow"

63805aa

This reverts commit 281c71f.

remove need for resume flag when training distillation

b91ce9f

separate configs

a7f5366

restructure exporter

4ac6433

ClemensSchwarke requested a review from Mayankm96 April 9, 2025 16:45

Mayankm96 reviewed Apr 10, 2025

View reviewed changes

source/isaaclab_rl/isaaclab_rl/rsl_rl/distillation_cfg.py Outdated Show resolved Hide resolved

Apply suggestions from code review

a172702

Signed-off-by: Mayank Mittal <[email protected]>

Mayankm96 reviewed Apr 10, 2025

View reviewed changes

Mayankm96 approved these changes Apr 10, 2025

View reviewed changes

Mayankm96 and others added 6 commits April 10, 2025 11:54

updates version

35b0d84

makes scripts backwards compatible

8b9a1e1

bumps rsl-rl version

d208670

fixes typo

10ec3ec

Merge branch 'main' into feature/rsl_rl_2_3_0_adaptation

5479358

Signed-off-by: Mayank Mittal <[email protected]>

fixes extension toml

2041684

Mayankm96 merged commit 477b6a9 into main Apr 10, 2025
3 of 4 checks passed

Mayankm96 deleted the feature/rsl_rl_2_3_0_adaptation branch April 10, 2025 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add configs and adapt exporter for RSL-RL distillation #2182

Add configs and adapt exporter for RSL-RL distillation #2182

Uh oh!

ClemensSchwarke commented Mar 28, 2025 •

edited by Mayankm96

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alessandroassirelli98 commented Apr 9, 2025

Uh oh!

ClemensSchwarke commented Apr 9, 2025

Uh oh!

Uh oh!

Mayankm96 Apr 10, 2025

Uh oh!

Mayankm96 left a comment

Uh oh!

Uh oh!

Uh oh!

-    """The number of environment steps the gradient flows back."""
+    """The number of rollout steps for gradient propagation.
+    This is useful for sequential training of recurrent student network.
+    """

Add configs and adapt exporter for RSL-RL distillation #2182

Add configs and adapt exporter for RSL-RL distillation #2182

Uh oh!

Conversation

ClemensSchwarke commented Mar 28, 2025 • edited by Mayankm96 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alessandroassirelli98 commented Apr 9, 2025

Uh oh!

ClemensSchwarke commented Apr 9, 2025

Uh oh!

Uh oh!

Mayankm96 Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

Mayankm96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ClemensSchwarke commented Mar 28, 2025 •

edited by Mayankm96

Loading