Robotics AI Suite

Description

Robotics AI Suite is a preview collection of robotics applications, libraries, samples, and benchmarking tools to help you build solutions faster. It includes models and pipelines optimized with the OpenVINO™ toolkit for accelerated performance on Intel® CPUs, integrated GPUs, and NPUs. Refer to the detailed user guide and documentation.

Collections

Collections organize workflows and capabilities for three robot categories—Autonomous Mobile Robots (AMRs), Humanoid Imitation Learning, and Stationary Robot Vision & Control. Each collection brings together libraries for core robotics workloads, robotics control recipes, and virtualization or application management; with Robot Operating System 2 (ROS 2) integration points, supported sensor profiles, and repeatable benchmarking. Each collection includes OpenVINO™ toolkit–optimized models across computer vision, large language models (LLMs), and vision-language-action (VLA) to accelerate inference on Intel® CPUs, integrated GPUs, and NPUs; helping teams evaluate, assemble, and scale solutions faster.

Humanoid - Imitation Learning:

Application Documentation Description
Diffusion Policy (OpenVINO Toolkit) Diffusion Policy (OpenVINO Toolkit) Diffusion Policy implementation optimized with OpenVINO toolkit
Imitation Learning - ACT Imitation Learning - ACT Imitation learning pipeline using Action Chunking with Transformers(ACT) algorithm to train and evaluate in simulated or real robot environments with Intel® optimization
Improved 3D Diffusion Policy (OpenVINO Toolkit) Improved 3D Diffusion Policy (OpenVINO Toolkit) Improved 3D Diffusion Policy implementation optimized with OpenVINO toolkit
LLM Robotics Demo LLM Robotics Demo Step-by-step guide for setting up a real-time system to control a JAKA robot arm with movement commands generated using an LLM
Robotics Diffusion Transformer (OpenVINO Toolkit) Robotics Diffusion Transformer (OpenVINO Toolkit) Robotics Diffusion Transformer implementation optimized with OpenVINO toolkit
VSLAM: ORB-SLAM3 VSLAM: ORB-SLAM3 One of the popular real-time feature-based SLAM libraries that can perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fish-eye lens models

Autonomous Mobile Robot:

Algorithm Documentation Description
ADBScan ADBScan ADBSCAN (Adaptive DBSCAN) is an Intel-patented algorithm. It is a highly adaptive and scalable object detection and localization (clustering) algorithm, tested successfully to detect objects at all ranges for 2D Lidar, 3D Lidar, and Intel® RealSense™ depth camera.
Collaborative-SLAM Collaborative-SLAM A Collaborative Visual SLAM example that is compiled natively for both Intel® Core™ and Intel® Atom® processor-based systems. In addition, GPU acceleration may be enabled on selected Intel® Core™ processor-based system.
Fastmapping Fastmapping
GroundFloor Segmentation GroundFloor Segmentation Showcases an Intel® algorithm designed for the segmentation of depth sensor data, compatible with 3D LiDAR or a Intel® RealSense™ camera inputs
ITS-Planner ITS-Planner Intelligent Sampling and Two-Way Search (ITS) global path planner is an Intel-patented algorithm. ITS is a new search approach based on two-way path planning and intelligent sampling, which reduces the compute time by about 20x-30x on a 1000-node map comparing with the A* search algorithm.
Multi-Camera-Demo Multicam-Demo Demonstrates the multi-camera use case using an Axiomtek ROBOX500 ROS2 AMR controller and four Intel® RealSense™ depth cameras D457
Object Detection Object Detection An example on using the ROS 2 node with OpenVINO toolkit. It outlines the steps for installing the node and executing the object detection model.
Simulations Simulations Tutorials on using the ROS 2 simulations with the Autonomous Mobile Robot. You can test robot sensing and navigation in these simulated environments.
Wandering Wandering Wandering mobile robot application is a ROS 2 sample application. It can be used with different SLAM algorithms in combination with the ROS2 navigation stack, to move the robot around in an unknown environment. The goal is to create a navigational map of the environment.

Stationary Robot Vision & Control:

Application Documentation Description
Stationary Robot Vision & Control Stationary Robot Vision & Control Stationary Robot Vision & Control is a robotic software framework aimed at tackling pick-and-place and track-and-place industrial problems. This is under active development, hence released in the pre-release quality.

OpenVINO™ Toolkit-Optimized Model Algorithms:

Algorithm Description
YOLOv8 CNN-based object detection
YOLOv12 CNN-based object detection
MobileNetV2 CNN-based object detection
SAM Transformer-based segmentation
SAM2 Extends SAM for video segmentation and object tracking with cross attention to memory
FastSAM Lightweight substitute to SAM
MobileSAM Lightweight substitute to SAM (Same model architecture with SAM. Refer to the OpenVINO toolkit and Segment Anything Model (SAM) tutorials for model exporting and application)
U-NET CNN-based segmentation and diffusion model
DETR Transformer-based object detection
DETR GroundingDino Transformer-based object detection
CLIP Transformer-based image classification
Qwen2.5VL Multimodal large language model
Whisper Automatic speech recognition
FunASR Automatic speech recognition
Action Chunking with Transformers - ACT An end-to-end imitation learning model designed for fine manipulation tasks in robotics
Visual Servoing - CNS A technique that uses feedback information extracted from a vision sensor to control robot motion
Diffusion Policy The ability to learn the gradient of the action distribution score function and optimize through the stochastic Langevin dynamics steps during inference provides a stable and efficient way to find optimal actions
Improved 3D Diffusion Policy (iDP3) Improved 3D Diffusion Policy (iDP3) builds upon the original Diffusion Policy framework by enhancing its capabilities for 3D robotic manipulation tasks
Robotics Diffusion Transformer (RDT-1B) Robotics Diffusion Transformer with 1.2B parameters (RDT-1B), is a diffusion-based foundation model for robotic manipulation
Feature Extraction Model: SuperPoint A self-supervised framework for interest point detection and description in images, suitable for a large number of multiple-view geometry problems in computer vision
Feature Tracking Model: LightGlue A model designed for efficient and accurate feature matching in computer vision tasks
Bird’s Eye View Perception: Fast-BEV Obtaining a Bird's Eye View (BEV) perception is to gain a comprehensive understanding of the spatial layout and relationships between objects in a scene
Monocular Depth Estimation: Depth Anything V2 A powerful tool that leverages deep learning to infer 3D information from 2D images