Engineering Blog
EtherCAT ROS 2 Real-Time Integration

Building a Real-Time EtherCAT to ROS 2 Bridge

James Whitaker 14 min read
Terminal screen showing ROS 2 joint state topic output with real-time position data from EtherCAT

EtherCAT and ROS 2 operate at fundamentally different time scales and with fundamentally different consistency models. EtherCAT's distributed clock synchronization targets cycle-to-cycle jitter in the low-microsecond range. ROS 2's DDS middleware — even with a real-time executor — operates in the tens-of-milliseconds range for non-RT kernel configurations. Bridging them without losing what makes EtherCAT valuable requires a carefully partitioned architecture, not just an adapter node.

This article documents the bridge we built for the TK series: IgH EtherCAT master on a PREEMPT-RT Linux kernel, a custom ros2_control hardware interface that crosses the RT boundary, and the configuration decisions that determine whether your cycle budget holds.

The Architecture: Two Loops, One Boundary

The design principle is straightforward: keep the hard real-time work entirely inside the IgH master's periodic task. ROS 2 sits outside the RT boundary and receives position, velocity, and torque data through a lock-free ring buffer. Command data flows the other direction through the same buffer. The ros2_control hardware interface reads and writes the buffer in its read() and write() methods, which execute inside the controller manager's real-time loop.

┌───────────────────────────────────────────────────────┐
│  PREEMPT-RT kernel space                              │
│                                                       │
│  IgH EtherCAT Master task (POSIX RT, period 1 ms)    │
│    ├── ecrt_master_receive()                          │
│    ├── ecrt_domain_process()     ← CoE PDO exchange   │
│    ├── write joint commands to ring buffer            │
│    ├── read joint states from ring buffer             │
│    ├── ecrt_domain_queue()                            │
│    └── ecrt_master_send()                             │
│                                                       │
│              ↕ lock-free SPSC ring buffer             │
│                                                       │
│  ros2_control controller_manager (RT executor)        │
│    ├── hardware_interface::read()  → publish states   │
│    └── hardware_interface::write() ← consume commands │
└───────────────────────────────────────────────────────┘

The IgH master task runs at a fixed 1 kHz period with a POSIX real-time thread priority of SCHED_FIFO 80. The ros2_control controller manager runs at the same period when configured for joint-level position and torque control; it can be reduced to 500 Hz for impedance control tasks where the control bandwidth is lower and DDS jitter budget can be widened.

EtherCAT Configuration: CoE Process Data and Distributed Clocks

Each TK actuator on the EtherCAT bus exposes a CoE object dictionary with Process Data Objects (PDOs) mapped by the ESI file. For the TK series, the default RxPDO (master → slave, i.e. commands) contains:

  • 0x6040 — Controlword (DS402 state machine)
  • 0x607A — Target position (interpolated position mode)
  • 0x6071 — Target torque (torque mode)
  • 0x60FF — Target velocity

The TxPDO (slave → master, i.e. actual states) returns:

  • 0x6041 — Statusword
  • 0x6064 — Position actual value (encoder count)
  • 0x606C — Velocity actual value
  • 0x6077 — Torque actual value
  • 0x2010 — Winding temperature (vendor-specific)

Distributed clocks (DC) synchronization is essential when running multiple joints at 1 kHz. Without DC, the SYNC0 event on each slave drifts independently, meaning each joint's process data is sampled at a slightly different absolute time. For a six-DOF arm with 1 ms cycle time, 50–200 µs of inter-slave skew is typical without DC; with DC enabled and a Sync0 shift time tuned to the cable propagation delay, inter-slave jitter drops to under 1 µs.

To enable DC in the IgH master:

/* During domain and slave configuration */
ec_slave_config_t *sc = ecrt_master_slave_config(
    master, 0, 0, TK_VENDOR_ID, TK_PRODUCT_CODE);

ecrt_slave_config_dc(
    sc,
    0x0300,         /* DC assign activate: SYNC0 + SYNC1 */
    1000000,        /* SYNC0 period: 1 ms in nanoseconds */
    500000,         /* SYNC0 shift: 500 µs (tune per topology) */
    0,              /* SYNC1 period: 0 = disabled */
    0               /* SYNC1 shift */
);

The Sync0 shift value requires tuning. A starting point of half the cycle period (500 µs for a 1 ms cycle) gives the slave enough time to latch PDO data and prepare it for the master's receive call. In practice, measure the shift sweep by logging the app_time offset field in the master's domain state and adjust until jitter is minimized. The IgH master provides a latency_nsec field in the domain status you can use to verify.

PREEMPT-RT Kernel Configuration

An RT kernel is necessary but not sufficient. Three configuration items matter more than the kernel patch itself for achieving consistent 1 kHz cycle times.

CPU Isolation

Add isolcpus=2,3 and rcu_nocbs=2,3 to the kernel boot parameters to isolate two cores for the IgH master task and the ros2_control controller manager thread. Tick-based interrupts, RCU callbacks, and kernel timers will be excluded from these cores. Without isolation, occasional kernel work stealing on an otherwise idle core causes cycle overruns of 200–800 µs at roughly 1–2% of cycles — not enough to trigger a fault but enough to corrupt encoder position estimates at high velocity.

IRQ Affinity

Pin the network interface IRQ (the NIC used for EtherCAT) to a non-isolated core. Counterintuitively, you want the NIC interrupt on a general-purpose core, not the isolated RT core. The IgH master task calls ecrt_master_receive() at the start of each cycle regardless of NIC interrupt state — it pulls the frame directly from the NIC Rx buffer. The NIC interrupt being on a separate core means it cannot preempt the RT task mid-cycle.

# Identify EtherCAT NIC IRQ number
cat /proc/interrupts | grep eth0

# Pin to core 1 (not isolated)
echo 2 > /proc/irq/<IRQ_NUM>/smp_affinity

Latency Budget

Measure actual cycle jitter before deploying. The tool is cyclictest from the rt-tests suite:

cyclictest --smp -p 90 -m -i 1000 -d 0 -a 2,3 -t 2 --duration=60s

Target: max latency under 50 µs over a 60-second run. On a modern x86 system with PREEMPT-RT, a well-configured setup reaches max latency below 30 µs. If you are seeing >100 µs, check for SMI (System Management Interrupt) activity using hwlatdetect — firmware SMIs are the most common culprit on bare-metal x86 and cannot be fixed in software. Upgrading BIOS and disabling unused chipset features (legacy USB emulation, PCIe Active State Power Management) typically brings SMI latency from 80–200 µs down to under 20 µs.

The ros2_control Hardware Interface

The hardware_interface::SystemInterface plugin exposes joint states and accepts commands through the StateInterface and CommandInterface registration mechanism. The read/write cycle frequency must match the IgH master cycle frequency, or you introduce a latency mismatch that manifests as position command lag.

// hardware_interface/tk_ethercat_system.hpp (excerpt)

hardware_interface::CallbackReturn TKEtherCATSystem::on_init(
    const hardware_interface::HardwareInfo & info) override
{
  // Parse joint names and actuator IDs from URDF srdf params
  for (const auto & joint : info.joints) {
    joints_.push_back({
      joint.name,
      std::stoi(joint.parameters.at("ethercat_slave_id")),
      0.0,  // pos
      0.0,  // vel
      0.0,  // torque cmd
    });
  }
  return hardware_interface::CallbackReturn::SUCCESS;
}

hardware_interface::return_type TKEtherCATSystem::read(
    const rclcpp::Time & /*time*/,
    const rclcpp::Duration & /*period*/) override
{
  for (auto & j : joints_) {
    // Pull from lock-free ring buffer written by IgH task
    RingEntry entry;
    if (state_ring_.try_pop(entry)) {
      j.position = entry.position_rad;
      j.velocity = entry.velocity_rads;
    }
  }
  return hardware_interface::return_type::OK;
}

The lock-free ring buffer uses a single-producer single-consumer (SPSC) design. The IgH master task is the sole producer; the ros2_control read thread is the sole consumer. This eliminates the need for mutexes in the critical path, which would introduce priority-inversion risk between the RT task and the DDS middleware threads. A commonly-cited alternative — using shared memory with std::atomic — works but provides only single-slot semantics; a ring buffer depth of 2–4 gives you visibility into whether the control loop is dropping cycles.

Where This Bridge Falls Short — and What to Do About It

We're not saying this architecture eliminates all timing concerns — it reduces them to a manageable set. There are two failure modes to understand explicitly.

DDS publish latency spike. Under heavy CPU load, the ROS 2 DDS layer can spike its publication latency by 3–8 ms for a single cycle. This does not affect the EtherCAT control loop — the IgH task continues writing to the ring buffer on schedule — but it does mean the joint_states topic timestamp can lag the actual hardware state by multiple cycles. If your MoveIt 2 planner relies on latency-sensitive state feedback, add a use_sim_time: false explicit timeout in your trajectory executor and configure the JointTrajectoryController's allow_integration_in_goal_constraints parameter appropriately.

Cycle overrun detection. The IgH master's ecrt_master_send() returns a timing flag if the previous cycle's frame was not returned before the next send. Log this count over your test runs. A rate above 0.1% of cycles indicates a system configuration problem — likely SMI interference or insufficient CPU isolation — not an acceptable operating condition. The TK firmware handles a missed cycle gracefully (it holds the last commanded torque for one cycle), but sustained cycle overruns at high torque will produce position errors that accumulate.

Running at 4 kHz: When 1 kHz Is Not Enough

For impedance control applications requiring higher control bandwidth — exoskeleton force reflection, precise contact detection — we have validated a 4 kHz EtherCAT cycle on the TK-120 and TK-240 with a 10-slave bus topology. The cycle period drops to 250 µs. Latency budget becomes tighter: the IgH task must complete all processing, including ring buffer write, inside 80–100 µs to leave margin for the Ethernet frame round-trip time plus slave processing.

At 4 kHz, the ros2_control controller manager does not need to run at the same frequency. Decouple them: run the EtherCAT cycle at 4 kHz, but have the hardware interface bridge publish state at 1 kHz by decimating. The controller manager's read() call picks the most recent valid entry from the ring buffer regardless of cycle frequency. This prevents the DDS layer from becoming a 4 kHz subscriber bottleneck while keeping the servo loop tight where it matters — inside the FOC firmware on the actuator itself, where the current controller already runs at 400 Hz PWM switching frequency independent of the EtherCAT cycle rate.

Bus topology matters at 4 kHz. Cable propagation adds roughly 5 ns/m; with a 10-slave daisy chain and total cable length of 8 m, the round-trip propagation adds ~80 ns — negligible relative to frame size. The bottleneck is frame size itself: each additional slave adds approximately 2.4 µs of processing time at the slave level. A 12-slave chain at 4 kHz sits near the practical limit for sub-250 µs cycle completion. Beyond 12 actuators at 4 kHz, consider splitting to two EtherCAT masters on separate PCIe NICs, each handling one limb of the robot. The IgH master supports multiple concurrent master instances.

Validating the Bridge End-to-End

Before running closed-loop control, verify the full chain with a latency measurement: publish a timestamped command topic in ROS 2, observe when the corresponding torque command appears in the EtherCAT PDO via IgH master's ethercat diagnostic tool or a bus monitor. End-to-end latency from ROS 2 publish to PDO write should be under 3 ms. Values above 5 ms indicate DDS middleware issues — check CYCLONEDDS_URI transport settings, specifically MaxMessageSize and interface binding. The TK series ESI files, IgH configuration examples, and the hardware interface plugin are in the integration docs at integration/ethercat.html.