Robotics AI Suite is a preview collection of robotics applications, libraries, samples, and benchmarking tools to help you build solutions faster. It includes models and pipelines optimized with the OpenVINO™ toolkit for accelerated performance on Intel® CPUs, integrated GPUs, and NPUs. Refer to the detailed user guide and documentation.
Collections organize workflows and capabilities for three robot categories—Autonomous Mobile Robots (AMRs), Humanoid Imitation Learning, and Stationary Robot Vision & Control. Each collection brings together libraries for core robotics workloads, robotics control recipes, and virtualization or application management; with Robot Operating System 2 (ROS 2) integration points, supported sensor profiles, and repeatable benchmarking. Each collection includes OpenVINO™ toolkit–optimized models across computer vision, large language models (LLMs), and vision-language-action (VLA) to accelerate inference on Intel® CPUs, integrated GPUs, and NPUs; helping teams evaluate, assemble, and scale solutions faster.
Humanoid - Imitation Learning:
| Application | Documentation | Description |
|---|---|---|
| Diffusion Policy (OpenVINO Toolkit) | Diffusion Policy (OpenVINO Toolkit) | Diffusion Policy implementation optimized with OpenVINO toolkit |
| Imitation Learning - ACT | Imitation Learning - ACT | Imitation learning pipeline using Action Chunking with Transformers(ACT) algorithm to train and evaluate in simulated or real robot environments with Intel® optimization |
| Improved 3D Diffusion Policy (OpenVINO Toolkit) | Improved 3D Diffusion Policy (OpenVINO Toolkit) | Improved 3D Diffusion Policy implementation optimized with OpenVINO toolkit |
| LLM Robotics Demo | LLM Robotics Demo | Step-by-step guide for setting up a real-time system to control a JAKA robot arm with movement commands generated using an LLM |
| Robotics Diffusion Transformer (OpenVINO Toolkit) | Robotics Diffusion Transformer (OpenVINO Toolkit) | Robotics Diffusion Transformer implementation optimized with OpenVINO toolkit |
| VSLAM: ORB-SLAM3 | VSLAM: ORB-SLAM3 | One of the popular real-time feature-based SLAM libraries that can perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fish-eye lens models |
Autonomous Mobile Robot:
| Algorithm | Documentation | Description |
|---|---|---|
| ADBScan | ADBScan | ADBSCAN (Adaptive DBSCAN) is an Intel-patented algorithm. It is a highly adaptive and scalable object detection and localization (clustering) algorithm, tested successfully to detect objects at all ranges for 2D Lidar, 3D Lidar, and Intel® RealSense™ depth camera. |
| Collaborative-SLAM | Collaborative-SLAM | A Collaborative Visual SLAM example that is compiled natively for both Intel® Core™ and Intel® Atom® processor-based systems. In addition, GPU acceleration may be enabled on selected Intel® Core™ processor-based system. |
| Fastmapping | Fastmapping | |
| GroundFloor Segmentation | GroundFloor Segmentation | Showcases an Intel® algorithm designed for the segmentation of depth sensor data, compatible with 3D LiDAR or a Intel® RealSense™ camera inputs |
| ITS-Planner | ITS-Planner | Intelligent Sampling and Two-Way Search (ITS) global path planner is an Intel-patented algorithm. ITS is a new search approach based on two-way path planning and intelligent sampling, which reduces the compute time by about 20x-30x on a 1000-node map comparing with the A* search algorithm. |
| Multi-Camera-Demo | Multicam-Demo | Demonstrates the multi-camera use case using an Axiomtek ROBOX500 ROS2 AMR controller and four Intel® RealSense™ depth cameras D457 |
| Object Detection | Object Detection | An example on using the ROS 2 node with OpenVINO toolkit. It outlines the steps for installing the node and executing the object detection model. |
| Simulations | Simulations | Tutorials on using the ROS 2 simulations with the Autonomous Mobile Robot. You can test robot sensing and navigation in these simulated environments. |
| Wandering | Wandering | Wandering mobile robot application is a ROS 2 sample application. It can be used with different SLAM algorithms in combination with the ROS2 navigation stack, to move the robot around in an unknown environment. The goal is to create a navigational map of the environment. |
Stationary Robot Vision & Control:
| Application | Documentation | Description |
|---|---|---|
| Stationary Robot Vision & Control | Stationary Robot Vision & Control | Stationary Robot Vision & Control is a robotic software framework aimed at tackling pick-and-place and track-and-place industrial problems. This is under active development, hence released in the pre-release quality. |
OpenVINO™ Toolkit-Optimized Model Algorithms:
| Algorithm | Description |
|---|---|
| YOLOv8 | CNN-based object detection |
| YOLOv12 | CNN-based object detection |
| MobileNetV2 | CNN-based object detection |
| SAM | Transformer-based segmentation |
| SAM2 | Extends SAM for video segmentation and object tracking with cross attention to memory |
| FastSAM | Lightweight substitute to SAM |
| MobileSAM | Lightweight substitute to SAM (Same model architecture with SAM. Refer to the OpenVINO toolkit and Segment Anything Model (SAM) tutorials for model exporting and application) |
| U-NET | CNN-based segmentation and diffusion model |
| DETR | Transformer-based object detection |
| DETR GroundingDino | Transformer-based object detection |
| CLIP | Transformer-based image classification |
| Qwen2.5VL | Multimodal large language model |
| Whisper | Automatic speech recognition |
| FunASR | Automatic speech recognition |
| Action Chunking with Transformers - ACT | An end-to-end imitation learning model designed for fine manipulation tasks in robotics |
| Visual Servoing - CNS | A technique that uses feedback information extracted from a vision sensor to control robot motion |
| Diffusion Policy | The ability to learn the gradient of the action distribution score function and optimize through the stochastic Langevin dynamics steps during inference provides a stable and efficient way to find optimal actions |
| Improved 3D Diffusion Policy (iDP3) | Improved 3D Diffusion Policy (iDP3) builds upon the original Diffusion Policy framework by enhancing its capabilities for 3D robotic manipulation tasks |
| Robotics Diffusion Transformer (RDT-1B) | Robotics Diffusion Transformer with 1.2B parameters (RDT-1B), is a diffusion-based foundation model for robotic manipulation |
| Feature Extraction Model: SuperPoint | A self-supervised framework for interest point detection and description in images, suitable for a large number of multiple-view geometry problems in computer vision |
| Feature Tracking Model: LightGlue | A model designed for efficient and accurate feature matching in computer vision tasks |
| Bird’s Eye View Perception: Fast-BEV | Obtaining a Bird's Eye View (BEV) perception is to gain a comprehensive understanding of the spatial layout and relationships between objects in a scene |
| Monocular Depth Estimation: Depth Anything V2 | A powerful tool that leverages deep learning to infer 3D information from 2D images |