FusionForce: End-to-end Differentiable Neural-Symbolic Layer for Trajectory Prediction

Abstract

We propose end-to-end differentiable model that predicts robot trajectories on rough offroad terrain from camera images and/or lidar point clouds. The model integrates a learnable component that predicts robot-terrain interaction forces with a neural-symbolic layer that enforces the laws of classical mechanics and consequently improves generalization on out-of- distribution data. The neural-symbolic layer includes a differentiable physics engine that computes the robot’s trajectory by querying these forces at the points of contact with the terrain. As the proposed architecture comprises substantial geometrical and physics priors, the resulting model can also be seen as a learnable physics engine conditioned on real sensor data that delivers $10^4$ trajectories per second. We argue and empirically demonstrate that this architecture reduces the sim-to-real gap and mitigates out-of-distribution sensitivity. The differentiability, in conjunction with the rapid simulation speed, makes the model well-suited for various applications including model predictive control, trajectory shooting, supervised and reinforcement learning, or SLAM.

Pipeline

The FusionForce Predictor estimates rich features in image and voxel domains.
Both camera and lidar features are vertically projected on the ground plane.
Multi-headed terrain encoder-decoder is used to predict terrain properties.
Neuro Symbolic Physics Engine estimate forces at robot-terrain contacts.
Resulting robot trajectory is integrated from these forces.

Learning employs three losses:

trajectory loss, $L_{\tau}$, which measures the distance between the predicted and real trajectory;
geometrical loss, $L_{g}$, which measures the distance between the predicted geometrical heightmap and lidar-estimated heightmap;
terrain loss, $L_{t}$, which enforces rigid terrain on rigid semantic classes revealed through image foundation model SEEM.

Terrain Encoder with unified BEV fusion

The depth distribution for each frustum ray is created by "lifting" into 3D the calibrated camera features, LSS.
The depth distribution is then projected onto the ground plane to create a 2D image BEV features.
The voxelized featured are computed from a 3D point cloud, VoxelNet.
The voxelized features are projected onto the ground plane to create a 2D lidar BEV features.
The camera and lidar BEV maps are fused into a single BEV features using an encoder-decoder architecture.
The fused BEV features is then used to predict the terrain properties, such as elevation and friction.

Differentiable Physics Engine

The differentiability of the developed physics engine allows for the end-to-end training of the model. For example, it allows to learn the terrain properties from the robot's trajectories.

Autonomous Navigation

The GPU parallelization of the physics engine allows the module to be used for real-time trajectory shooting.
The cost of the trajectories is computed from the predicted robot-terrain interaction forces.
The path to follow is selected based on the cost of the trajectories and the distance to a waypoint.

Installation

The package is organized as a ROS 2 package and can be installed using the following commands:

mkdir -p ~/ros2_ws/src
cd ~/ros2_ws/src
git clone [email protected]:ctu-vras/fusionforce.git
pip install -r fusionforce/config/singularity/requirements.txt

cd ~/ros2_ws/
rosdep install --from-path src --ignore-src -r
colcon build --symlink-install --packages-select fusionforce

Usage

Physics Engine

To test the Physics Engine, one can run the following command:

python fusionforce/scripts/run_physics_engine.py

The ROS2 node can be launched with the following command:

ros2 launch fusionforce physics_engine.launch.py

Topics:

input: /terrain/grid_map - the terrain GridMap message with the terrain properties (elevation, friction, etc.).
output: /sampled_paths - the predicted robot trajectories as a MarkerArray message.
output: /path_costs - the costs of the predicted trajectories as a Float32MultiArray message.

Parameters:

num_robots: number of robots (trajectories) to simulate,
traj_sim_time: simulation time for each trajectory in seconds,
gridmap_layer: name of the elevation layer in the GridMap message,
max_age: maximum age of the terrain map in seconds to be processed.

FusionForce Predictor (Terrain Encoder)

The ROS2 node for the FusionForce Predictor can be launched with the following command:

ros2 launch fusionforce terrain_encoder.launch.py terrain_encoder:=<model> img_topics:=[<img_topic1>, .. , <img_topicN>] camera_info_topics:=[<info_topic1>, .. , <info_topicN>] cloud_topic:=<cloud_topic>

where <model> is one of the available models: voxelnet, lss, or bevfusion.

Topics:

input: a list of img_topics as a sensor_msgs/CompressedImage messages (required for lss, and bevfusion),
input: a list of camera_info_topics as a sensor_msgs/CameraInfo messages (required for lss and bevfusion),
input: cloud_topic as a sensor_msgs/PointCloud2 message (required for voxelnet and bevfusion).
output: /terrain/grid_map as a GridMap message with the estimated terrain properties.

Parameters:

model: the model to use for the terrain encoder, one of voxelnet, lss, or bevfusion,
robot_frame: the frame of the robot, e.g., base_link,
fixed_frame: the fixed frame for the gravity alignment, e.g., map,
max_msgs_delay: maximum delay in seconds for the input messages to be processed,
max_age: maximum age of the sensor input in seconds to be processed,

FusionForce (Terrain Encoder + Physics Engine)

The ROS2 node for the FusionForce Predictor with the Physics Engine can be launched with the following command:

ros2 launch fusionforce fusionforce.launch.py terrain_encoder:=<model> img_topics:=[<img_topic1>, .. , <img_topicN>] camera_info_topics:=[<info_topic1>, .. , <info_topicN>] cloud_topic:=<cloud_topic>

The module combines the FusionForce Predictor and the Physics Engine into a single node. It has the same input as the FusionForce Predictor, but also outputs the predicted robot trajectories and their costs, as the Physics Engine does.

The FusionForce prediction example: supporting terrain elavation projected to the robot's camera frames.

Citation

Consider citing the paper if you find the work relevant to your research:

@article{agishev2025fusionforce,
  title={FusionForce: End-to-end Differentiable Neural-Symbolic Layer for Trajectory Prediction},
  author={Agishev, Ruslan and Zimmermann, Karel},
  journal={arXiv preprint arXiv:2502.10156},
  year={2025},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2502.10156},
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
fusionforce		fusionforce
fusionforce_gazebo		fusionforce_gazebo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FusionForce: End-to-end Differentiable Neural-Symbolic Layer for Trajectory Prediction

Abstract

Pipeline

Terrain Encoder with unified BEV fusion

Differentiable Physics Engine

Autonomous Navigation

Installation

Usage

Physics Engine

FusionForce Predictor (Terrain Encoder)

FusionForce (Terrain Encoder + Physics Engine)

Citation

About

Uh oh!

Releases 1

Languages

License

ctu-vras/fusionforce

Folders and files

Latest commit

History

Repository files navigation

FusionForce: End-to-end Differentiable Neural-Symbolic Layer for Trajectory Prediction

Abstract

Pipeline

Terrain Encoder with unified BEV fusion

Differentiable Physics Engine

Autonomous Navigation

Installation

Usage

Physics Engine

FusionForce Predictor (Terrain Encoder)

FusionForce (Terrain Encoder + Physics Engine)

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages