Skip to content

High CPU usage due to short bond heartbeat period (0.1s) in nav2_util::LifecycleNode #5784

@zz990099

Description

@zz990099

Bug report

Required Info:

  • Operating System:
    • Ubuntu-24.04
  • Computer:
    • 12th Gen Intel(R) Core(TM) i5-12600KF
  • ROS2 Version:
    • rolling
  • Version or commit hash:
  • DDS implementation:
    • Fast-RTPS

Description

We encountered this issue while deploying Navigation2 on an RK3588 platform. We observed that the navigation node group was using a high amount of CPU even when idle, which affected the overall stability of the system. After investigating, we identified that the default heartbeat period in nav2_util::LifecycleNode was set too low, leading to the excessive CPU usage.

  1. With the current heartbeat confirmation period set to 0.1s, the nav2_container node group's CPU usage is around 140% while idle.
  2. When we increase the confirmation period to 1.0s, the idle CPU usage of the nav2_container node group drops to about 75%.
  3. If we disable the composition feature and run individual node processes, with a period interval of 0.1s, each process uses around 15% CPU; increasing the period to 1.0s brings the usage down to 7% per process.

This issue may partly explain why Nav2 performs poorly on hardware with limited computing resources.

CPU Usage Screenshots

  • nav2_container cpu usage with heartbeat period 0.1s.
Image
  • Single-node process cpu usage with heartbeat period 0.1s.
Image
  • nav2_container cpu usage with heartbeat period 1.0s.
Image
  • Single-node process cpu usage with heartbeat period 1.0s.
Image

Suggested Modification Strategy

  1. Increase the bond_heartbeat_period in nav2_util::LifecycleNode to 1.0s.

    bond_heartbeat_period = this->declare_or_get_parameter<double>("bond_heartbeat_period", 0.1);

  2. Parameterize the hard-coded bond_heartbeat_period in nav2_lifecycle_manager::LifecycleManager, and update its default value to 1.0s.

  3. Remove the connection establishment check in the createBondConnection function of nav2_lifecycle_manager::LifecycleManager to avoid prolonged startup times.

    LifecycleManager::createBondConnection(const std::string & node_name)
    {
    const double timeout_ns =
    std::chrono::duration_cast<std::chrono::nanoseconds>(bond_timeout_).count();
    const double timeout_s = timeout_ns / 1e9;
    if (bond_map_.find(node_name) == bond_map_.end() && bond_timeout_.count() > 0.0) {
    bond_map_[node_name] =
    std::make_shared<bond::Bond>("bond", node_name, shared_from_this());
    bond_map_[node_name]->setHeartbeatTimeout(timeout_s);
    bond_map_[node_name]->setHeartbeatPeriod(0.10);
    bond_map_[node_name]->start();
    if (
    !bond_map_[node_name]->waitUntilFormed(
    rclcpp::Duration(rclcpp::Duration::from_nanoseconds(timeout_ns / 2))))
    {
    RCLCPP_ERROR(
    get_logger(),
    "Server %s was unable to be reached after %0.2fs by bond. "
    "This server may be misconfigured.",
    node_name.c_str(), timeout_s);
    return false;
    }
    RCLCPP_INFO(get_logger(), "Server %s connected with bond.", node_name.c_str());
    }
    return true;
    }

Thanks

Thank you very much for your hard work and dedication to maintaining and improving Navigation2. We are submitting this issue to share our recent experience and observations, and hope that it can help make Nav2 better. We greatly appreciate your time and any suggestions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions