Klepsydra ROS2 Multi Core Ring Buffer Executor User Guide

June 15, 2024
Klepsydra

Klepsydra-logo

Klepsydra ROS2 Multi Core Ring Buffer Executor

Klepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-product-
image

Product Information

Specifications

  • Lightweight, modular, and compatible with most used operating systems
  • ROS2 Executor plugin capable of processing up to 10 times more data with up to 50% reduction in CPU consumption
  • GPU (Graphic Processing Unit) for high parallelization and increased processing data rate and GPU utilization
  • Klepsydra AI
  • Klepsydra SDK
  • Klepsydra GPU
  • Streaming capability
  • Klepsydra ROS2 executor plugin worldwide application

Product Usage Instructions

  1. Context: Parallel Processing
    The product is designed to address challenges related to on-board processing in space applications, where CPU usage, data volume, and power requirements are of concern. It offers a solution for modern hardware and old software by handling medium data volumes efficiently.

  2. Compare and Swap
    “Compare and Swap” is an algorithm used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This operation is performed as a single atomic operation. The product implements this algorithm as part of its functionality.

Pros and Cons of Lock-Free Programming
The product utilizes lock-free programming techniques, which have advantages and disadvantages

  • Pros:
    • Reduced CPU usage
    • Efficient handling of data volume
  • Cons:
    • Complexity in implementation
    • Requires support from underlying hardware (most modern hardware supports lock-free programming)

ROS2 Executor
The product includes a ROS2 Executor that schedules the ROS2 application by managing the callbacks of subscriptions, messages, services, timers, and nodes. It consumes messages from the underlying middleware DDS queues and dispatches them for execution to one of the threads. The executor can be configured to execute callbacks sequentially or in parallel.

State-of-the-art ROS2 Executors
The product incorporates state-of-the-art ROS2 Executors that provide different execution strategies

  • Single Threaded Executor: Executes callbacks sequentially and periodically scans the application structure to update nodes, subscriptions, services, etc.
  • Multi-Threaded Executor: Executes callbacks in parallel and periodically scans the application structure to update the description of the problem.

Streaming Executor
The product introduces a streaming executor that differs from the threaded executor in several aspects

  • It does not reconstruct the executable list for every iteration.
  • All nodes, callback groups, timers, subscriptions, etc. are created at construction time.
  • All subscriptions inside a node are executed on the same thread, regardless of the number of cores used for the streaming setup.
  1. Klepsydra Streaming Executor
    The product’s streaming executor is specifically designed to handle streaming topics using publisher-subscriber pairs. It efficiently delivers messages to subscribers in all nodes, received from the middleware via the rmw. The event loop manages these topics, ensuring smooth data flow.

  2. Klepsydra Realm
    The Klepsydra Realm is a component of the streaming executor that acts as a scheduler. It coordinates the execution of producers and consumers within the streaming setup.

  3. Producer
    A producer is responsible for generating data that will be consumed by the subscribers. The product supports multiple producers in the streaming setup.

  4. Consumer
    A consumer is responsible for processing and utilizing the data generated by the producers. The product supports multiple consumers in the streaming setup.

Frequently Asked Questions (FAQ)

  1. What operating systems are compatible with the product?
    The product is compatible with most commonly used operating systems.

  2. How much data can the product process compared to traditional methods?
    The product is capable of processing up to 10 times more data with up to 50% reduction in CPU consumption compared to traditional methods.

  3. Does the product utilize GPU parallelization?
    Yes, the product utilizes GPU (Graphic Processing Unit) for high parallelization, which increases the processing data rate and GPU utilization.

An Offline Optimization Approach for Multi-Core ROS2 Applications: The Multi- Core Ring-Buffer ROS2 Executor

ROS Meetup Stuttgart 2023

Dr Pablo Ghiglino ( pablo.ghiglino@klepsydra.com )

The ROS2 Streaming Executor

CONTEXT: PARALLEL PROCESSING

  • Challenges on on-board processing
  • Modern hardware and old software:
    • Computers max out with low to medium data volumes
    • Inefficient use of resources
    • Excessive power for low data processing

Consequences for Space applications

  • Recurrent mission failures due to software
  • Access to sensor data from Earth is time consuming.
  • Satellites struggle to meet power requirements

COMPARE AND SWAP

  • Compare-and-swap (CAS) is an instruction used in multithreading to achieve synchronisation. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation.
  • Compare-and-Swap has been an integral part of the IBM 370 architectures since 1970.
  • Maurice Herlihy (1991) proved that CAS can implement more of these algorithms than atomic read, write, and fetch-and-addKlepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01 \(3\)

PROS AND CONS OF LOCK-FREE PROGRAMMINGKlepsydra-ROS2-Multi-Core-Ring-
Buffer-Executor-01 \(4\)

Pros

  • Less CPU consumption required
  • Lower latency and higher data throughput
  • Substantial increase in determinism

Cons

  • Extremely difficult programming technique
  • Requires processor with CAS instructions (90% of the market have them, though)

Klepsydra Ring-buffer

Klepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01
\(5\)

THE PRODUCT
Lightweight, modular and compatible with most used operating systems

  • SDK – Software Development Kit
    Boost data processing at the edge for general applications and processor intensive algorithms

  • AI – Artificial Intelligence
    High performance deep neural network (DNN) engine to deploy any AI or machine learning module at the edge

  • ROS2 Executor plugin
    Executor for ROS2 able to process up to 10 times more data with up to 50% reduction in CPU consumption.

  • GPU (Graphic Processing Unit)
    High parallelisation of GPU to increase the processing data rate and GPU utilizationKlepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01
\(6\)

Context: ROS1 in Space

  • Benefits:
  • Enables autonomy, perception, and control in space
  • Flexible, modular, and supported by a large community.
  • Examples :
    • NASA’s Robonaut 2 (R2) – Assisted astronauts on the ISS
    • NASA’s Astrobee – Autonomous operations on the ISS
    • SPHERES – Spherical satellites for research on the ISS
    • Dextre – Robotic arm for manipulation and repairs on the ISS
    • Google Lunar XPRIZE – Used for lunar rover missions

Context: Space-ROS

  • ROS framework in space exploration and robotics

  • Partially qualified for Space use

  •   Examples:

    • Robotic assistants for astronauts

    • Planetary exploration with autonomous rovers

    • Satellite operations and control

    • Autonomous spacecraft docking

    • On-orbit servicing and repairs

  • Benefits:

    • Enables autonomy, communication, and perception
    • Accelerates space mission development and operations.

ROS2 Executor Explained

An executor coordinates and schedules the ROS2 application by managing the callbacks of subscriptions, messages, services, timers and nodes. In ROS2, the executor does not keep its own queue of messages and callbacks, but instead consumes messages from the underlying middleware DDS queues, and then dispatches it for execution to one of the threads.

State-of-the-art

ROS2 Executors

  • Single threaded executor: A single thread queries the middleware and executes the callbacks sequentially. It then scans the structure and updates nodes, subscriptions, services, etc.
  • The Static single threaded executor, where the scan and definition of the structure is executed only once, during construction. All nodes, callback groups, timers, subscriptions etc. are created before the spin() is called.
  • The Multi-threaded executor creates a number of threads that will execute callbacks in parallel. Similar to the single threaded executor, it will periodically scan the structure of the application and update the description of the problem.

The streaming executor Overview

  • The Streaming Executor uses the Klepsydra Event Loop to deliver messages to the subscribers in all nodes, that are coming from the middleware via the rmw. The event loop manages these topics using publisher-subscriber pairs.
  • The streaming executor behaves similarly to the static single threaded executor in several aspects. First, it doesn’t reconstruct the executable list for every iteration. All nodes, callback groups, timers, subscriptions etc. are created at construction time. Secondly, all subscriptions inside a node are executed on the same thread, regardless of the number of cores used for the streaming setup.

Klepsydra Streaming Executor

Klepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01
\(7\)

The streaming executor

How does it work?

  • A publisher-subscriber pair is created for each topic required by a given ROS2 node.
  • Internally, each publisher-subscriber pair is identified by two parameters: the node name and the topic name. In other words, two different nodes publishing to the same topic will be managed independently.
  • All of the publisher-subscriber pairs associated to topics that belong to the same node are managed by the same event loop.Klepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01 \(10\)

The streaming executor Single-core vs multi-coreKlepsydra-ROS2-Multi-
Core-Ring-Buffer-Executor-01 \(8\)

  • The advantage of the streaming executor is that there is no need for multithreading management of the subscribers due to all of them are managed by the thread of the associated event loop, which is common to both single-core and multi-core.
  • The former works in a similar manner to the static single-threaded executor, since all subscribers in all nodes are invoked by the same thread

Performance Benchmark: The Ref-System

Performance Benchmark

  • Klepsydra Streaming Benchmarks setup:
    • The benchmark was based on the Autoware reference system. It emulates a realistic driving application.
    • All measurements were taken using a Raspberry Pi 4B with: ROS galactic, Ubuntu 20.04 and 4 GB of ram, constant frequency of 1.50GHz
    • Compatible setup of the reference system, and without CPU isolation
  • Processors tested:
    • Raspberry PI 4 (reference processor for the RTWG)
    • Unibap’s iX10 (NASA and Blue Origin Testbed)
    • Teledyne e2v LS1046

Performance Benchmark

Klepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01
\(9\)

Klepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01
\(10\)

Klepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01
\(12\)

  • The target for the genetic algorithm is to minimise the average latency of the critical path.
  • That is, the time it takes since the publication of Lidar Data until the Object Collision Estimator completes its work.
  • Figure shows the critical path that our research attempted to optimise

The streaming executor
Klepsydra Streaming Distribution Optimiser (SDO)

  • The multi-core variation of the streaming executor work best when the load distribution of the nodes among cores is optimised and can make a substantial difference in terms of latency, power consumption and data throughput.
  • However, the mapping of cores is not trivial and requires a systematic approach. A possible approach is to define a target function that measures the performance of the system based on the core configuration.
  • A genetic algorithm can then be used to optimize the core configuration by iteratively testing different configurations and selecting those that perform well according to the target function. This process continues until an optimal configuration is found.
  • This approach allows for a more efficient use of the multi-core system and ensures that the load is distributed in the most optimal way.

Results Summary

  • For small node work, the added complexity does not translate to better results. The static single threaded, being extremely simple, outperforms the rest of the executors.
  • Increasing the workload, the Streaming Executor is the best executor followed by the Static Single Threaded Executor.
  • It was expected that the Streaming Executor performed consistently better than the single threaded, since the application does not modify its topology while running. It is shown that this is indeed the caseKlepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01 \(13\)
  • Results for Raspberry PI4
  • Similar increase in performance obtained for Unibap’s iX10 and Teledyne’s LS1046Klepsydra-ROS2-Multi-Core-Ring-Buffer-Executor-01 \(14\)

Conclusions

Summary

  • This article presents a novel approach to optimizing ROS2 execution model that combines both a lock-free ring-buffer based ROS2 executor implementation, and the use of genetic algorithms to optimize the distribution of the robotic application load into the available cores in the target computer.
  • This combination has been proven to work very efficiently for system with heavy computational load, like the above explained reference system. A key benefit of the presented research is its adaptability to different applications: different ROS2 node topologies can be accelerated using the streaming executor plus genetic optimisation, which is one of the most discussed challenges in ROS2.

Future Work

Summary

  • As for the future work of this research, there are several features to be included in the streaming executor:
    • support to humble and rolling,
    • open source release of the single-core streaming executor and
    • the use of the sensor multiplexer as well as the event loop for topics with several subscribers.
    • RISC-V architecture testing

References

Read User Manual Online (PDF format)

Read User Manual Online (PDF format)  >>

Download This Manual (PDF format)

Download this manual  >>

Related Manuals