Skip to content

Add OSMO AMR Navigation test case#1018

Open
KeitaW wants to merge 1 commit intomainfrom
feature/osmo-amr-navigation-test-case
Open

Add OSMO AMR Navigation test case#1018
KeitaW wants to merge 1 commit intomainfrom
feature/osmo-amr-navigation-test-case

Conversation

@KeitaW
Copy link
Copy Markdown
Collaborator

@KeitaW KeitaW commented Mar 12, 2026

Summary

  • Adds a new NVIDIA OSMO test case under 3.test_cases/osmo/AMRNavigation/ for warehouse AMR (Autonomous Mobile Robot) navigation synthetic data generation and training
  • 6-stage pipeline orchestrated as an OSMO DAG on Amazon EKS: scene setup → occupancy mapping → trajectory generation → multi-modal rendering (parallel RGB/depth/segmentation) → domain augmentation → X-Mobility foundation model training
  • Uses heterogeneous compute (G-series for rendering, P-series for training), KAI Scheduler, and S3-based inter-stage data passing via IRSA

Contents

  • 3 Dockerfiles: isaac-sim (stages 1-4), cosmos-transfer (stage 5), xmobility (stage 6)
  • 6 pipeline scripts (src/): Full Python implementation for each stage
  • 4 OSMO workflow YAMLs: data pipeline (combination), training (single-task), full sequential (reference), smoke test (CPU-only)
  • 4 numbered shell scripts (kubernetes/): setup, build, verify, submit
  • 2 config files: default pipeline settings + per-stage parameters
  • 2 READMEs: top-level overview + detailed Kubernetes instructions

Key OSMO features demonstrated

  • DAG task dependencies via inputs:
  • Combination workflows with parallel task groups (Group 4: 3 render passes)
  • KAI Scheduler integration
  • Priority scheduling and checkpoint/reschedule semantics
  • Heterogeneous NodePool targeting (rendering vs training)

Test plan

  • E2E validated on EKS cluster with OSMO v6.2-rc6 (smoke test passed, full AMR data pipeline submitted and ran)
  • Verify README instructions are reproducible on a fresh cluster
  • Validate all 6 stages complete with GPU capacity available

Adds a complete NVIDIA OSMO test case for warehouse AMR (Autonomous Mobile Robot)
navigation using a MobilityGen-style pipeline on Amazon EKS. The pipeline includes
6 stages: scene setup, occupancy mapping, trajectory generation, multi-modal rendering
(parallel RGB/depth/segmentation), domain augmentation, and X-Mobility foundation
model training. Uses OSMO DAG orchestration, KAI Scheduler, and heterogeneous
compute (G-series for rendering, P-series for training).
@bluecrayon52 bluecrayon52 self-requested a review March 12, 2026 16:58
Copy link
Copy Markdown
Contributor

@bluecrayon52 bluecrayon52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a page not found error in the NVIDIA OSMO installation guide link provided in the AMRNavigation/kubernetes/README.md.

It looks like Nvidia provides an AWS Infra with Terraform guide, did you use something like this to deploy the EKS cluster?

I'm ok with the Terraform infrastructure being outside of the scope of the test case and externally referenceable to a stack that Nvidia maintains.

Does it make sense to also include:

  • Links or Helm commands for installing the in-cluster prerequisites (GPU operator, KAI scheduler, Karpenter, OSMO)
  • YAML manifests for the required Karpenter NodePools that the 4.verify-osmo.sh script checks for with instructions on how to apply (osmo-rendering, osmo-gpu-training, osmo-cpu-batch, osmo-cpu-system)
  • Instructions for loading the nvidia/X-Mobility dataset from HuggingFace to S3

If you plan to cut another PR to provide architecture setup automations, then perhaps we can link to those resources in the prerequisites section of this test case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants