Description
Describe the bug
The recent commit c984604
(#2118), which optimises data getters inside asset classes, appears to break functionality in my custom training workflows. Learning outcomes now differ significantly (and mostly degrade). After investigation I traced the issue to articulation_data.py
: reverting only the changes introduced by c984604
in this file restores the previous training-workflow performance.
Are the changes in c984604
intended to affect anything beyond efficiency? If not, those modifications may introduce problems for other users as well.
While it’s hard to isolate a single culprit—my setup uses numerous custom MDP elements for events, observations, and curriculum—the likely pattern is that certain data buffers receive new timestamps before their underlying data are fresh, feeding stale values into the learning pipeline.
Example: body_state_w
depends on body_link_pose_w
and body_com_vel_w
. If any function directly modifies the buffers of _body_link_pose_w.data
or _body_com_vel_w.data
after body_state_w
has already been accessed, body_state_w
will not capture this fresh information when it is called a second time. Any MDP element relying on body_state_w
then might use out-of-date data for that timestep. I suspect such a case can slip through unnoticed, especially in event elements that randomise object or robot poses. In my environment I randomise the robot pose so that its end-effector starts near an object, whose pose is also randomized. With the recent changes, the function that randomises the end-effector may now read the object’s old pose, leading to incorrect initialisation.
To mitigate this in my own elements I invalidate the timestamp whenever stale data is possible (e.g. robot.data._body_link_pose_w.timestamp = -1
). This alters the learning curve slightly but still fails to recover the performance seen before c984604
. Do any built-in IsaacLab components (e.g. FrameTransformers
, standard MDP elements, etc.) call these getters in a way that could also surface stale values?
Steps to reproduce
Revert the changes from c984604
(at least in articulation_data.py
), then re-run a training workflow that previously converged correctly. You might observe markedly different learning behaviour between the reverted and current versions.
System Info
- Commit: ba2a7dc
- Isaac Sim Version: 4.5
- OS: Ubuntu 22.04
- GPU: 4090 RTX
- CUDA: 12.2
- GPU Driver: 535.129.03
Additional context
The issue might still relate to how I use the getters in my custom scripts, but ideally the data flow should be robust to such usage patterns.
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have checked that the issue is not in running Isaac Sim itself and is related to the repo
Acceptance Criteria
- All data pathways remain free of stale values.