intel FPGA Programmable Acceleration Card N3000 User Guide
- June 3, 2024
- Intel
Table of Contents
intel FPGA Programmable Acceleration Card N3000 User Guide
Introduction
Background
The Intel FPGA Programmable Acceleration Card N3000 in a virtualized radio access network (vRAN) requires support for the IEEE1588v2 as a Precision Time Protocol (PTP) Telecom Slave Clocks (T-TSC) to schedule software tasks appropriately. The Intel Ethernet Controller XL710 in Intel® FPGA PAC N3000 provides the IEEE1588v2 support. However, the FPGA data path introduces jitter that affects the PTP performance. Adding a transparent clock (T-TC) circuit enables the Intel FPGA PAC N3000 to compensate for its FPGA internal latency and mitigates the effects of the jitter, which allows the T-TSC to approximate the Grandmaster’s Time of Day (ToD) efficiently.
Objective
These tests validate the use of Intel FPGA PAC N3000 as the IEEE1588v2 slave in Open Radio Access Network (O-RAN). This document describes:
- Test setup
- Verification process
- Performance evaluation of transparent clock mechanism in the FPGA path of Intel FPGA PAC N3000
- PTP performance of the Intel FPGA PAC N3000 The performance of the Intel FPGA PAC N3000 supporting the transparent clock is
compared with the Intel FPGA PAC N3000 without transparent clock as well as with another Ethernet card XXV710 under various traffic conditions and PTP configurations.
Features and Limitations
The features and validation limitations for the Intel FPGA PAC N3000 IEEE1588v2 support are as following:
- Software stack used: Linux PTP Project (PTP4l)
- Supports the following telecom profiles:
- 1588v2 (default)
- G.8265.1
- G.8275.1
- Supports two-step PTP slave clock.
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current specifications in accordance with Intel’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. *Other names and brands may be claimed as the property of others.
- Supports end-to-end multicast mode.
- Supports PTP message exchange frequency of up to 128 Hz.
- This is a limitation of the validation plan and employed Grandmaster. PTP configurations higher than 128 packets per second for PTP messages might be possible.
- Due to limitations of the Cisco Nexus 93180YC-FX switch used in the validation setup, the performance results under iperf3 traffic conditions refer to PTP message exchange rate of 8 Hz.
- Encapsulation support:
- Transport over L2 (raw Ethernet) and L3 (UDP/IPv4/IPv6)
Note: In this document, all results use a single 25Gbps Ethernet link.
- Transport over L2 (raw Ethernet) and L3 (UDP/IPv4/IPv6)
Tools and Driver Versions
Tools | Version |
---|---|
BIOS | Intel Server Board S2600WF 00.01.0013 |
OS | CentOS 7.6 |
Kernel | kernel-rt-3.10.0-693.2.2.rt56.623.el7.src. |
Data Plane Development Kit (DPDK) | 18.08 |
Intel C Compiler | 19.0.3 |
Intel XL710 Driver (i40e driver) | 2.8.432.9.21 |
PTP4l | 2.0 |
IxExplorer | 8.51.1800.7 EA-Patch1 |
lperf3 | 3.0.11 |
trafgen | Netsniff-ng 0.6.6 Toolkit |
IXIA Traffic Test
The first set of PTP performance benchmarks for Intel FPGA PAC N3000 utilizes an IXIA* solution for network and PTP conformance testing. The IXIA XGS2 chassis box includes an IXIA 40 PORT NOVUS-R100GE8Q28 card and IxExplorer which provides a graphical interface for setting up a virtual PTP Grandmaster to the DUT (Intel FPGA PAC N3000) over a single 25 Gbps direct Ethernet connection. The block diagram below illustrates the targeted testing topology for the IXIA-based benchmarks. All the results use IXIA-generated traffic for the ingress traffic tests and utilize the trafgen tool on the Intel FPGA PAC N3000 host for the egress traffic tests, where the ingress or egress direction is always from the perspective of the DUT (Intel FPGA PAC N3000) host. In both cases, the average traffic rate is 24 Gbps. This test setup provides a baseline characterization of the PTP performance of Intel FPGA PAC N3000 with the T-TC mechanism enabled, as well as comparing it to the non-TC Intel FPGA PAC N3000 factory image under the ITU-T G.8275.1 PTP profile.
Topology for Intel FPGA PAC N3000 Traffic Tests under IXIA Virtual Grandmaster
![Topology for Intel FPGA PAC N3000 Traffic Tests under IXIA Virtual
Grandmaster](https://manuals.plus/wp- content/uploads/2022/12/Screenshot_1-198.jpg)
IXIA Traffic Test Result
The following analysis captures the PTP performance of the TC-enabled Intel FPGA PAC N3000 under ingress and egress traffic conditions. In this section, the PTP profile G.8275.1 has been adopted for all traffic tests and data collection.
Magnitude of Master Offset
The following figure shows the magnitude of master offset observed by the PTP4l slave client of the Intel FPGA PAC N3000 host as a function of elapsed time under ingress, egress and bidirectional traffic (average throughput of 24.4Gbps).
Mean Path Delay (MPD)
The following figure shows the mean path delay, as calculated by the PTP4 slave that uses the Intel FPGA PAC N3000 as a network interface card, for the same test as the above figure. The total duration of each of the three traffic tests is at least 16 hours.
The following table lists statistical analysis of the three traffic tests. Under a traffic load close to the channel capacity, the PTP4l slave that uses the Intel FPGA PAC N3000 maintains its phase offset to the IXIA’s virtual grandmaster within 53 ns for all traffic tests. In addition, the standard deviation of the master offset magnitude is below 5 ns.
Statistical Details on the PTP Performance
**G.8275.1 PTP Profile| Ingress Traffic (24Gbps)| Egress
Traffic (24Gbps)| Bidirectional Traffic (24Gbps)**
---|---|---|---
RMS| 6.35 ns| 8.4 ns| 9.2 ns
StdDev (of abs(max) offset)| 3.68 ns| 3.78 ns| 4.5 ns
StdDev (of MPD)| 1.78 ns| 2.1 ns| 2.38 ns
Max offset| 36 ns| 33 ns| 53 ns
The following figures represent the magnitude of the master offset and the mean path delay (MPD), under a 16-hour long 24 Gbps bidirectional traffic test for different PTP encapsulations. The left graphs in these figures refer to PTP benchmarks under IPv4/UDP encapsulation, while the PTP messaging encapsulation of the right graphs is in L2 (raw Ethernet). The PTP4l slave performance is quite similar, the worst-case master offset magnitude is 53 ns and 45 ns for IPv4/UDP and L2 encapsulation, respectively. The standard deviation of the magnitude offset is 4.49 ns and 4.55 ns for IPv4/UDP and L2 encapsulation, respectively.
Magnitude of Master Offset
The following figure shows the magnitude of master offset under 24 Gbps
bidirectional traffic, IPv4 (left) and L2 (right) encapsulation, G8275.1
Profile.
Mean Path Delay (MPD)
The following figure shows the mean path delay of Intel FPGA PAC N3000 host
PTP4l slave under 24 Gbps bidirectional traffic, IPv4 (left) and L2 (right)
encapsulation, G8275.1 Profile.
The absolute values of the MPD is not a clear indication of PTP consistency, as it depends on length cables, data path latency and so on; however, looking at the low MPD variations (2.381 ns and 2.377 ns for IPv4 and L2 case, respectively) makes it obvious that the PTP MPD calculation is consistently accurate across both encapsulations. It verifies consistency of the PTP performance across both the encapsulation modes. The level change in the calculated MPD in the L2 graph (in the above figure, right graph) is due to the incremental effect of the applied traffic. Firstly, the channel is idle (MPD rms is 55.3 ns), then ingress traffic is applied (second incremental step, MPD rms is 85.44 ns), followed by simultaneous egress traffic, resulting in a calculated MPD of 108.98 ns. The following figures overlay the magnitude of the master offset and the calculated MPD of the bidirectional traffic test applied to both a PTP4l slave using the Intel FPGA PAC N3000 with T-TC mechanism, as well as to another that uses the Intel FPGA PACN3000 without TC functionality. The T-TC Intel FPGA PAC N3000 tests (orange) start from time zero, while the PTP test that utilizes the non-TC Intel FPGA PAC N3000 (blue) starts around T = 2300 seconds.
Magnitude of Master Offset
The following figure shows the magnitude of master offset under Ingress
traffic (24 Gbps), with and without TTC support, G.8275.1 Profile.
In the above figure, the PTP performance of the TC-enabled Intel FPGA PAC N3000 under traffic is similar to the non-TC Intel FPGA PAC N3000 for the first 2300 seconds. The effectiveness of the T-TC mechanism in Intel FPGA PAC N3000 is highlighted in the segment of test (after the 2300th second) where equal traffic load is applied to the interfaces of both cards. Similarly in the figure below, the MPD calculations are observed before and after applying the traffic on the channel. The effectiveness of the T-TC mechanism is highlighted in compensating for the residence time of the packets which is the packet latency through the FPGA path between the 25G and the 40G MACs.
Mean Path Delay (MPD)
The following figure shows the mean path delay of Intel FPGA PAC N3000 host
PTP4l slave under Ingress traffic (24 Gbps), with and without T-TC support,
G.8275.1 Profile.
These figures show the PTP4l slave’s servo algorithm, due to the residence time correction of the TC, we see small differences in the average path delay calculations. Therefore, the impact of the delay fluctuations on the master offset approximation is reduced. The following table lists statistical analysis on the PTP performance, which includes the RMS and standard deviation of the master offset, standard deviation of the mean path delay, as well as worst-case master offset for the Intel FPGA PAC N3000 with and without T-TC support.
Statistical Details on the PTP Performance Under Ingress Traffic
Ingress Traffic (24Gbps) G.8275.1 PTP Profile| Intel FPGA PAC N3000
with T- TC| Intel FPGA PAC N3000 without T-TC
---|---|---
RMS| 6.34 ns| 40.5 ns
StdDev (of abs(max) offset)| 3.65 ns| 15.5 ns
StdDev (of MPD)| 1.79 ns| 18.1 ns
Max offset| 34 ns| 143 ns
A direct comparison the TC-supported Intel FPGA PAC N3000 to the non-TC
version
Shows that the PTP performance is 4x to 6x lower with respect to any of the
statistical
metrics (worst-case, RMS or standard deviation of master offset). The worst-
case
master offset for the G.8275.1 PTP configuration of T-TC Intel FPGA PAC N3000
is 34
ns under ingress traffic conditions at the limit of the channel bandwidth
(24.4Gbps).
lperf3 Traffic Test
This section describes the iperf3 traffic benchmarking test to further evaluate the PTP performance of the Intel FPGA PAC N3000. The iperf3 tool has been utilized to emulate active traffic conditions. The network topology of the iperf3 traffic benchmarks, shown in the figure below, involves connection of two servers, each using a DUT card (Intel FPGA PAC N3000 and XXV710), to Cisco Nexus 93180YC FX switch. The Cisco switch acts as a Boundary Clock (T-BC) between the two DUT PTP slaves and the Calnex Paragon-NEO Grandmaster.
Network Topology for Intel FPGA PAC N3000 lperf3 Traffic Test
The PTP4l output on each of the DUT hosts provides data measurements of the PTP performance for each slave device in the setup (Intel FPGA PAC N3000 and XXV710). For iperf3 traffic test, the following conditions and configurations apply to all graphs and performance analysis:
- 17 Gbps aggregated bandwidth of traffic (both TCP and UDP), either egress or ingress or bidirectional to Intel FPGA PAC N3000.
- IPv4 encapsulation of PTP packets, due to configuration limitation on Cisco Nexus 93180YC-FX switch.
- PTP message exchange rate limited to 8 packets/second, due to configuration limitation on Cisco Nexus 93180YC-FX switch.
perf3 Traffic Test Result
The following analysis captures the performance of Intel FPGA PAC N3000 and XXV710 card, both simultaneously acting as a network interface card of PTP slaves (T-TSC) the Calnex Paragon NEO Grandmaster through the T-BC Cisco switch.
The following figures show magnitude of master offset and MPD over time for three different traffic tests using the Intel FPGA PAC N3000 with T-TC and XXV710 card. In both the cards, bidirectional traffic has the largest effect on the PTP4l performance. The traffic test durations are 10 hours long. In the following figures, graph’s tail marks a point on time where the traffic stops and the magnitude of PTP master offset goes down to its low levels, due to the idle channel.
Magnitude of Master Offset for Intel FPGA PAC N3000
The following figure shows the mean path delay for Intel FPGA PAC N3000 with T
TC, under ingress, egress and bidirectional iperf3 traffic.
Mean Path Delay (MPD) for Intel FPGA PAC N3000
The following figure shows the mean path delay for Intel FPGA PAC N3000 with T
TC, under ingress, egress and bidirectional iperf3 traffic.
Magnitude of Master Offset for XXV710
The following figure shows the magnitude of master offset for XXV710, under
ingress, egress and bidirectional iperf3 traffic.
Mean Path Delay (MPD) for XXV710
The following figure shows the mean path delay for for XXV710, under ingress,
egress and bidirectional iperf3 traffic.
Regarding the Intel FPGA PAC N3000 PTP performance, the worst-case master offset under any traffic condition is within 90 ns. While under the same bidirectional traffic conditions, the RMS of the Intel FPGA PAC N3000 master offset is 5.6x better than the one of XXV710 card.
Intel FPGA PAC N3000 | XXV710 Card | |
---|---|---|
Ingress Traffic 10G | Egress Traffic 18G | **Bidirectional |
Traffic 18G** | Ingress Traffic 18G | Egress Traffic 10G |
Bidirectional Traffic 18G
RMS| 27.6 ns| 14.2 ns| 27.2 ns| 93.96 ns| 164.2 ns| 154.7 ns
StdDev(of abs(max) offset)| 9.8 ns| 8.7 ns| 14.6 ns| 61.2 ns| 123.8 ns| 100 ns
StdDev (of MPD)| 21.6 ns| 9.2 ns| 20.6 ns| 55.58 ns| 55.3 ns| 75.9 ns
Max offset| 84 ns| 62 ns| 90 ns| 474 ns| 1,106 ns| 958 ns
Notably, the master offset of the Intel FPGA PAC N3000 has lower standard
deviation,
at least 5x less than the XXV710 card, signifies that the PTP approximation of
the
Grandmaster clock is less sensitive to latency or noise variations under
traffic in the
Intel FPGA PAC N3000.
When compared to the IXIA Traffic Test Result on page 5, the worst-case
magnitude of
the master offset with a T-TC enabled Intel FPGA PAC N3000 appears higher.
Besides
the differences in network topology and channel bandwidths, this is due to the
Intel
FPGA PAC N3000 being captured under a G.8275.1 PTP profile (16 Hz sync rate),
while
the sync message rate in this case is constrained at 8 packets per second.
Magnitude of Master Offset Comparison
The following figure shows the magnitude of master offset comparison under bidirectional iperf3 traffic.
Mean Path Delay (MPD) Comparison
The following figure shows the mean path delay comparison under bidirectional
iperf3 traffic.
The superior PTP performance of the Intel FPGA PAC N3000, when compared to XXV710 card, is also supported by the evidently higher deviation of the calculated mean path delay (MPD) for XXV710 and Intel FPGA PAC N3000 in each of the targeted traffic test, for example bidirectional iperf3 traffic. Ignore the mean value in each MPD case, which can be different due to a number of reasons, such as different Ethernet cables and different core latency. The observed disparity and spike in values for XXV710 card are not present in the Intel FPGA PAC N3000.
RMS of 8 Consecutive Master Offset Comparison
Conclusion
The FPGA data path between QSFP28 (25G MAC) and Intel XL710 (40G MAC) adds a variable packet latency which affects the approximation accuracy of the PTP Slave. Adding the Transparent Clock (T-TC) support in the FPGA soft logic of Intel FPGA PAC N3000 provides compensation of this packet latency by appending its residence time in the correction field of encapsulated PTP messages. The results confirm that the T-TC mechanism improves the accuracy performance of the PTP4l slave.
Also, the IXIA Traffic Test Result on page 5 show that the T-TC support in the FPGA data path enhances the PTP performance by at least 4x, when compared to the Intel FPGA PAC N3000 without T-TC support. The Intel FPGA PAC N3000 with T-TC presents a worst-case master offset of 53 ns under ingress, egress or bidirectional traffic loads at the limit of channel capacity (25 Gbps). Hence, with T-TC support, the Intel FPGA PAC N3000 PTP performance is both more accurate and less prone to noise variations.
In lperf3 Traffic Test on page 10, the PTP performance of the Intel FPGA PAC N3000 with T-TC enabled is compared against a XXV710 card. This test captured the PTP4l data for both slave clocks under ingress or egress traffic that is exchanged between the two hosts of Intel FPGA PAC N3000 and XXV710 card. The worst-case master offset observed in the Intel FPGA PAC N3000 is at least 5x lower than the XXV710 card. Also, the standard deviation of the captured offsets also proves that the T-TC support of Intel FPGA PAC N3000 allows smoother approximation of the Grandmaster’s clock.
To further validate the PTP performance of Intel FPGA PAC N3000, the potential test options include:
- Validation under different PTP profiles and message rates for more than one Ethernet links.
- Evaluation of lperf3 Traffic Test on page 10 with a more advanced switch that allows higher PTP message rates.
- Evaluation of the T-SC functionality and its PTP timing accuracy under G.8273.2 Conformance Testing.
Document Revision History for IEEE 1588 V2 Test
Document Version | Changes |
---|---|
2020.05.30 | Initial release. |
References
Read User Manual Online (PDF format)
Read User Manual Online (PDF format) >>