Software

Writing a Clean ROS 2 Hardware Abstraction Layer for Custom Actuators

Nadia Osei August 19, 2025 Research Blog

Writing a custom hardware interface for ros2_control is one of those tasks that looks straightforward in the documentation and turns into a week-long debugging exercise the first time you actually do it. The lifecycle state machine, the separation between command and state interfaces, the real-time safety requirements in the read() and write() methods, the URDF hardware tag format — none of these are difficult individually, but they interact in ways that are easy to get wrong, and the error messages when you get them wrong are often unhelpfully opaque.

This walkthrough covers the complete structure of a ros2_control hardware interface plugin for a custom joint actuator: from the SystemInterface class skeleton through lifecycle management, to the read/write loop, and finally to the URDF hardware description. We'll use a single revolute joint with position, velocity, and torque state interfaces as the example. The principles apply directly to multi-joint actuator systems with the same architecture.

What ros2_control Actually Does (And What It Doesn't)

Before writing any code, it helps to be clear on what ros2_control is responsible for and what your hardware interface plugin is responsible for.

ros2_control provides: the controller manager (which loads and manages controllers), the resource manager (which owns the state and command interface data vectors), a set of standard controllers (joint_trajectory_controller, effort_controller, etc.), and the real-time-safe update loop that calls your hardware interface's read() and write() methods at a fixed rate (typically 100–1000 Hz).

Your hardware interface plugin provides: the physical communication with the actuator hardware (EtherCAT, CAN FD, SPI, or any bus you're using), the mapping between raw hardware register values and the SI-unit state interfaces (radians, rad/s, Nm), and the reverse mapping from SI-unit command interfaces to raw hardware commands. The plugin also declares what state and command interfaces exist — this declaration is what allows controllers to bind to your hardware's joints.

What ros2_control does NOT do: real-time scheduling (that's the responsibility of your OS configuration and the executor you run the controller manager in), thread safety within your read() and write() methods (that's your responsibility), or hardware fault handling (you need to propagate hardware errors through the lifecycle state machine yourself).

The SystemInterface Skeleton

A hardware interface plugin for a multi-joint system inherits from hardware_interface::SystemInterface. The four pure virtual methods you must implement are:

CallbackReturn on_init(const HardwareInfo & info);
CallbackReturn on_activate(const rclcpp_lifecycle::State & previous_state);
CallbackReturn on_deactivate(const rclcpp_lifecycle::State & previous_state);
hardware_interface::return_type read(const rclcpp::Time & time, const rclcpp::Duration & period);
hardware_interface::return_type write(const rclcpp::Time & time, const rclcpp::Duration & period);

There is also export_state_interfaces() and export_command_interfaces(), which you override to declare what interfaces your hardware exposes. These are called once during resource manager initialization — before on_activate().

on_init() is where you parse the HardwareInfo struct (which comes from the URDF hardware tag) to get joint names, parameter values, and interface declarations. Open your hardware communication channel here only if it is safe to do so before the lifecycle reaches ACTIVE; otherwise, defer it to on_activate(). The distinction matters: on_init() is called at node startup, which may be before your EtherCAT master is ready.

Declaring State and Command Interfaces Correctly

The export_state_interfaces() and export_command_interfaces() functions return vectors of StateInterface and CommandInterface objects. Each interface binds to a double stored in your plugin's member variables — these doubles are the shared memory between your hardware loop and the controller manager.

A common mistake: allocating the backing doubles in a local variable or temporary vector and then moving the interface objects out. The interface object stores a pointer to the backing double; if the backing double moves after the interface is constructed, the pointer becomes dangling. Store your backing variables in stable member storage (e.g., a std::vector<double> that you resize once in on_init() and never resize again).

For a single revolute joint with position, velocity, effort state interfaces and an effort command interface, the pattern looks like:

// In class member storage:
std::vector<double> pos_state_, vel_state_, eff_state_, eff_cmd_;

// In export_state_interfaces():
interfaces.emplace_back(joint_name, HW_IF_POSITION, &pos_state_[i]);
interfaces.emplace_back(joint_name, HW_IF_VELOCITY, &vel_state_[i]);
interfaces.emplace_back(joint_name, HW_IF_EFFORT,   &eff_state_[i]);

// In export_command_interfaces():
interfaces.emplace_back(joint_name, HW_IF_EFFORT, &eff_cmd_[i]);

The strings HW_IF_POSITION, HW_IF_VELOCITY, and HW_IF_EFFORT are constants defined in hardware_interface/types/hardware_interface_type_values.hpp. Do not hardcode the strings manually — the controller manager uses these constants to match interfaces, and a typo ("effort" vs "Effort") causes silent binding failures that are very frustrating to track down.

The Read and Write Methods: Real-Time Requirements

The read() and write() methods are called from the controller manager's update loop, which runs in a real-time thread if you've configured the executor appropriately. This imposes real constraints:

No dynamic memory allocation (new, malloc, string construction with +, std::vector::push_back that triggers reallocation)
No blocking calls (no mutex locks that may block, no socket reads that may block waiting for data)
No logging with RCLCPP_INFO in the hot path (ROS logging is not real-time safe; use it only in fault paths or outside the update loop)
No system calls that may block (no file I/O, no sleep())

For EtherCAT communication, the standard pattern is to have a separate real-time EtherCAT master thread (e.g., using the SOEM library with a CLOCK_REALTIME periodic thread) that runs at your fieldbus cycle rate (1ms for EtherCAT, configurable for CAN FD). This thread reads PDO data from the hardware into a shared memory region. Your read() method then copies from that shared memory into the ros2_control state interface doubles — a fast, bounded-time copy operation that is real-time safe. The write() method copies command interface doubles into the shared memory region; the EtherCAT master thread reads from there and sends the PDO on the next fieldbus cycle.

For a 1kHz ros2_control update rate with a 1ms EtherCAT cycle, this architecture means there is at most 1ms of staleness between a hardware state reading and the ros2_control update that consumes it. That is acceptable for joint impedance control at typical gains. If your control bandwidth requires lower latency, run the EtherCAT master at 500µs and the ros2_control loop at 2kHz — but verify that the host CPU can sustain this without deadline misses under full robot load.

Lifecycle State Machine and Error Propagation

The ros2_control lifecycle follows the rclcpp_lifecycle state machine: UNCONFIGURED → INACTIVE (on configure) → ACTIVE (on activate) → INACTIVE (on deactivate) → FINALIZED (on cleanup). Your hardware interface interacts with this at on_activate() and on_deactivate().

On activate: open the communication bus if not already open, verify joint count matches the URDF hardware description, issue the enable command to all joints, wait for state readback, and check that initial state values are plausible (not NaN, not outside mechanical range). If any of these fail, return CallbackReturn::ERROR — this keeps the hardware interface in INACTIVE state rather than entering ACTIVE, which prevents controllers from being loaded onto a hardware interface that isn't ready.

A common error: returning CallbackReturn::SUCCESS from on_activate() before verifying hardware readback, then failing silently in the first read() call. The controller manager doesn't know the hardware is in a fault state; controllers proceed to command positions or torques to joints with stale state data. This is how hardware damage happens in development. Check hardware readback in on_activate() before returning SUCCESS.

URDF Hardware Tag Format

The hardware interface plugin is loaded by the controller manager through a URDF <ros2_control> tag, which declares the plugin name, joint names, and parameter values. A minimal example:

<ros2_control name="tendonkindle_system" type="system">
  <hardware>
    <plugin>tendonkindle_hardware/TKSystemInterface</plugin>
    <param name="bus_type">ethercat</param>
    <param name="ethercat_ifname">eth0</param>
    <param name="control_frequency_hz">1000</param>
  </hardware>
  <joint name="left_knee_joint">
    <command_interface name="effort"/>
    <state_interface name="position"/>
    <state_interface name="velocity"/>
    <state_interface name="effort"/>
  </joint>
</ros2_control>

The HardwareInfo struct passed to on_init() is parsed from this tag. info.joints is a vector of ComponentInfo objects, one per <joint> tag. info.hardware_parameters is a map of the <param> key-value pairs. Access these with checked lookups (info.hardware_parameters.at("bus_type")) rather than unchecked operators, so you get a descriptive exception if a required parameter is missing rather than a silent default.

One thing we learned writing our own HAL for EtherCAT: the URDF joint ordering in the <ros2_control> tag must match the joint ordering in your EtherCAT PDO mapping, or every joint's state will be assigned to the wrong control interface. This isn't validated anywhere in ros2_control — it's silently wrong. Document the ordering convention in your code, and add an on_activate() check that reads each joint's position and verifies it's within the expected range for that joint's kinematic limits.

Testing the Interface Before Running on Hardware

ros2_control includes a MockComponents plugin that you can use to test controller loading, interface binding, and trajectory command flow against a fake hardware backend — without physical hardware present. Set the URDF hardware plugin to mock_components/GenericSystem and your controller configuration is fully testable in simulation. We run these tests as part of our continuous integration pipeline to catch interface declaration bugs before they reach hardware.

Once the mock tests pass, the next step is testing the real plugin with hardware disabled — loading the plugin, verifying interface export, running on_init(), but calling on_activate() with the hardware physically disconnected. A well-written plugin handles this gracefully by returning CallbackReturn::ERROR at the right point. If the plugin crashes the controller manager process instead of returning an error, that's a bug to fix before the first powered hardware test.

A clean hardware abstraction layer doesn't just make ROS 2 integration easier — it makes your custom control loops easier too, because the same interface contract is available whether you're running via the ros2_control pipeline or your own custom executor. Building that abstraction once, correctly, is worth the upfront investment.