AN 872 Programmable Acceleration Card with Intel Arria 10 GX FPGA User Guide

June 9, 2024
Intel

AN 872 Programmable Acceleration Card with Intel Arria 10 GX FPGA

AN 872-Programmable-Acceleration-Card -Intel-Arria-10-GX-FPGA-
product

Introduction

About this Document

This document provides methods to estimate and validate the power and thermal performance of your AFU design using the Intel® Programmable Acceleration Card with Intel Arria® 10 GX FPGA in the target server platform.

Power Specification

The board management controller monitors and manages thermal and power events on the Intel FPGA PAC. When the board or FPGA is overheating or drawing excessive current, the board management controller shuts down the FPGA power for protection. Subsequently, it also brings down the PCIe link which may cause an unexpected system crash. Refer to Auto-Shutdown for more details about the criteria that triggers board shutdown. In normal cases, the FPGA temperature and power are by far the leading cause of shutdown. To minimize downtime and ensure system stability, Intel recommends that the total board power does not go beyond 66 W and FPGA power does not go beyond 45 W. Individual components and board assemblies have power variability. Therefore, the nominal values are lower than the limits to ensure that the board does not experience a random shutdown in a system with varying workloads and inlet temperatures.

Power Specification


System

| Total Board Power (watts)| ****

FPGA Power (watts)

---|---|---
A system with an FPGA Interface Manager (FIM) and AFU that runs with worst- case throttling workload for minimum 15 minutes at the core temperature of 95°C.| ****

66

| ****

45

The total board power varies depending on your Accelerator Functional Unit (AFU) design (amount and frequency of logic toggling), inlet temperature, system temperature and airflow of the target slot for the Intel FPGA PAC. To manage this variability, Intel recommends you meet this power specification to prevent power shutdown by the Board Management Controller.

Related Information

Auto-Shutdown.

Prerequisites

The server original equipment manufacturer (OEM) must validate that each Intel FPGA PAC interfacing to a PCIe slot in a target server platform can stay within the thermal limits even when the board consumes the maximum allowed power (66 W). For more information, refer to the Intel PAC with Intel Arria 10 GX FPGA Platform Qualification Guidelines(1).

Tools Requirements

You must have the following tools to estimate and evaluate the power and thermal performance.

  • Software:
    • Intel Acceleration Stack for Development
    • BWtoolkit
    • AFU Design(2)
    • Tcl script (download) – Required to format the programming file for analysis
    • Early Power Estimator for Intel Arria 10 devices
    • Intel FPGA PAC Power Estimator Sheet (download)
  • Hardware:
    • Intel FPGA PAC
    • Micro-USB cable(3)
    • Target Server for Intel FPGA PAC(4)

Intel recommends you to follow the Intel Acceleration Stack Quick Start Guide for Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA for software installation.

Related Information

Intel Acceleration Stack Quick Start Guide for Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA.

  1. Contact your Intel support representative to access this document.
  2. The build_synth directory is created after you compile your AFU.
  3. In Acceleration Stack 1.2, the board monitoring is performed over PCIe.
  4. Ensure that your OEM has validated the targeted PCIe slot(s) in accordance to the Platform Qualification Guidelines for your Intel FPGA PAC.

Using the Board Management Controller

Auto-Shutdown

The Board Management Controller monitors and controls resets, different power rails, FPGA and board temperatures. When the Board Management Controller senses conditions that can potentially damage the board, it automatically shuts down board power for protection.

Note: When the FPGA loses power, the PCIe link between the Intel FPGA PAC and host is down. In many systems, the PCIe link-down may cause a system crash.

Auto-Shutdown Criteria

The following table lists the criteria beyond which the Board Management Controller shuts down board power.

Parameter Threshold Limit
Board Power 66 W
12v Backplane Current 6 A
12v Backplane Voltage 14 V
1.2v Current 16 A
1.2v Voltage 1.4 V
1.8v Current 8 A
1.8v Voltage 2.04 V
3.3v Current 8 A
3.3v Voltage 3.96 V
FPGA Core Voltage 1.08 V
FPGA Core Current 60 A
FPGA Core Temperature 100°C
Core Supply Temperature 120°C
Board Temperature 80°C
QSFP Temperature 90°C
QSFP Voltage 3.7 V

Recovering After Auto-Shutdown

The Board Management Controller holds power off until the next power cycle. Therefore, when an Intel FPGA PAC card power is shut down, you must power cycle the server to return power to the Intel FPGA PAC.

The common cause of power shutdown is the FPGA overheating (when the core temperature is over 100°C), or the FPGA drawing excessive current. This typically happens when the AFU design exceeds the Intel FPGA PAC defined power envelopes or there is insufficient airflow. In this case, you must reduce power consumption in your AFU.

Monitor On-Board Sensors Using OPAE

Use the fpgainfo command line program to gather the temperature and power sensor data from the Board Management Controller. You can use this program with the Acceleration Stack 1.2 and beyond. For Acceleration Stack 1.1 or older, use the BWMonitor tool as described in the next section.

To gather the temperature data:

  • bash-4.2$ fpgainfo temp

Sample output

AN 872-Programmable-Acceleration-Card -Intel-Arria-10-GX-FPGA-
fig-2

To gather the power data

  • bash-4.2$ fpgainfo power

Sample output

AN 872-Programmable-Acceleration-Card -Intel-Arria-10-GX-FPGA-
fig-4

Monitor On-Board Sensors Using BWMonitor

  • BWMonitor is a BittWare tool that allows you to measure FPGA/board temperature, voltage, and current.

Prerequisite: You must install a micro-USB cable between the Intel FPGA PAC and the server.

  1. Install the appropriate BittWorks II Toolkit-Lite software, firmware, and bootloader.

OS-Compatible BittWorks II ToolkitLite Version

Operating System| Release| BittWorks II Toolkit-Lite Version| Install Command
---|---|---|---
CentOS 7.4/RHEL 7.4| 2018.6 Enterprise Linux 7 (64-bit)| bw2tk-

lite-2018.6.el7.x86_64.rpm

|
sudo yum install bw2tk-\ lite-2018.6.el7.x86_64.rpm|
Ubuntu 16.04| 2018.6 Ubuntu 16.04 (64-bit)| bw2tk-

lite-2018.6.u1604.amd64.deb

|
sudo dpkg -i bw2tk-\ 2018.6.u1604.amd64.deb|

Refer the Getting Started webpage to download the BMC firmware and tools

  • BMC Firmware version: 26889
  • BMC Bootloader version: 26879

Save the files to a known location on the host machine. The following script prompts for this location.

Add Bittware tool to PATH:

  • export PATH=/opt/bwtk/2018.6.0L/bin/:$PATH

You can launch the BWMonitor using

  • /opt/bwtk/2018.6L/bin/bwmonitor-gui&

Sample Measurements

AN 872-Programmable-Acceleration-Card -Intel-Arria-10-GX-FPGA-
fig-10

AFU Design Power Verification

Power Measurement Flow

To evaluate the power for your AFU design, capture the following metrics:

  • Total board power and FPGA temperature
    • (after running the worst-case data patterns on your design for 15 minutes)
  • Static Power and Temperature
    • (using a static power measurement design)
  • Worst Case Static Power
    • (predicted values using the Early Power Estimator for Intel Arria 10 devices)

Then, use the Intel FPGA PAC Power Estimator Sheet (download) with these recorded metrics to verify if your AFU design meets the specification.

Measuring the Total Board Power

Follow these steps

  1. Install the Intel PAC with Intel Arria 10 GX FPGA into a qualified PCIe slot in the server. If you are using BWMonitor for measurement, connect the Micro-USB cable from back of the card to any USB port of the server.
  2. Load your AFU and run at its maximum power.
    • If the AFU uses Ethernet, then ensure that the network cable or module is inserted and connected to the link partner and network traffic is turned on in the AFU.
    • If appropriate, run DMA continuously to exercise on-board DDR4.
    • Run your applications on the host to feed the AFU the worst-case traffic as well as to fully exercise FPGA. Ensure that you stress the FPGA with the most stressful data traffic. Run this step for minimum 15 minutes to allow the FPGA core temperature to settle.
    • Note: During testing, monitor the total board power, FPGA power, and FPGA core temperature value to ensure they stay within specification. If 66 W, 45 W, or 100°C limits are reached, stop the test immediately.
  3. After the FPGA core temperature becomes stable, use the fpgainfo program or BWMonitor tool to record the total board power and FPGA core temperature. Input these values in row Step 1: Total board power measurement of the Intel FPGA PAC Power Estimator Sheet.

Intel FPGA PAC Power Estimator Sheet Sample

AN 872-Programmable-Acceleration-Card -Intel-Arria-10-GX-FPGA-
fig-11

Measuring the Real Static Power

Leakage current is a leading cause of board-to-board power consumption variation. The power measurements from the above section include power due to leakage current (static power) and power due to the AFU logic (dynamic power). In this section, you will measure the static power of the board-under-test in order to understand the dynamic power.

Before measuring the FPGA static power, use the disable-gpio-input- bufferintelpac-arria10-gx.tcl script (download) to process the FPGA programming file, (.sof file) which contains a FIM and AFU design. The tcl script disables all FPGA input pins to ensure that there is no toggling inside the FPGA (which means no dynamic power). Refer to the Minimal Flow Example to compile a sample AFU. The generated .sof file is located at:

  • cd $OPAE_PLATFORM_ROOT/hw/samples/ $ OPAE_PLATFORM_ROOT/hw/samples/build_synth/build/outputfiles/ afu*.sof

You must save the disable-gpio-input-buffer-intel-pac-arria10-gx.tcl in the above directory and then run the following command

  • quartusasm -t disable-gpio-input-buffer-intel-pac-arria10-gx.tclafu*.sof

Sample output

Info: *** Info:
Running Quartus Prime Assembler
Info: Version 17.1.1 Build 273 12/19/2017 SJ Pro Edition
Info: Copyright (C) 2017 Intel Corporation. All rights reserved. Info: Your use
of Intel Corporation’s design tools, logic functions Info: and other software and tools, and its AMPP partner logic Info: functions, and any output files from any of the foregoing Info: (including device programming or simulation files), and any Info: associated documentation or information are expressly subject Info: to the terms and conditions of the Intel Program License Info: Subscription Agreement, the Intel Quartus Prime License Agreement, Info:

AN 872-Programmable-Acceleration-Card -Intel-Arria-10-GX-FPGA-
fig-15

Upon successful execution of the tcl script, the afu_*.sof file is updated and ready for FPGA programming.

Follow these steps to measure the real static power

  1. Use the Intel Quartus® Prime programmer to program the *.sof file. Refer to the using the Intel Quartus Prime Programmer on page 12 for detailed steps.
  2. Monitor the FPGA core temperature, voltage, and current using the BWMonitor tool. Enter these values in row Step 2: FPGA core static power measurement of the Intel FPGA PAC Power Estimator Sheet.

Related Information

  • Intel Acceleration Stack Quick Start Guide for Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA
  • Monitor On-Board Sensors Using BWMonitor.

Using the Intel Quartus Prime Programmer

You must have the micro USB cable connected between the Intel FPGA PAC and the server to execute these steps:

  1. Find the Root Port and Endpoint of the Intel FPGA PAC card: $ lspci -tv | grep 09c4

Example output 1 shows that the Root Port is d7:0.0 and the Endpoint is d8:0.0

  • -+-[0000:d7]-+-00.0-[d8]—-00.0 Intel Corporation Device 09c4

Example output 2 shows that the Root Port is 0:1.0 and the Endpoint is 3:0.0

  • +-01.0-[03]—-00.0 Intel Corporation Device 09c4

Example output 3 shows that the Root Port is 85:2.0 and the Endpoint is 86:0.0 and

  • +-[0000:85]-+-02.0-[86]—-00.0 Intel Corporation Device 09c4

Note: No output indicates a PCIe* device enumeration failure and that flash is not programmed.

  • Mask uncorrectable errors and correctable errors of FPGA

    • $ sudo setpci -s d8:0.0 ECAP_AER+0x08.L=0xFFFFFFFF
    • $ sudo setpci -s d8:0.0 ECAP_AER+0x14.L=0xFFFFFFFF
  • Mask uncorrectable errors and Mask correctable errors of RP

    • $ sudo setpci -s d7:0.0 ECAP_AER+0x08.L=0xFFFFFFFF
    • $ sudo setpci -s d7:0.0 ECAP_AER+0x14.L=0xFFFFFFFF

Run the following Intel Quartus Prime Programmer command:

  • sudo $QUARTUS_HOME/bin/quartuspgm -m JTAG -o ‘pvbi;afu*.sof’
  1. To unmask uncorrectable errors and mask correctable errors, run the following commands
    • Unmask uncorrectable errors and mask correctable errors of FPGA

    • $ sudo setpci -s d8:0.0 ECAP_AER+0x08.L=0x00000000
    • $ sudo setpci -s d8:0.0 ECAP_AER+0x14.L=0x00000000
    • Unmask uncorrectable errors and mask correctable errors of RP:

    • $ sudo setpci -s d7:0.0 ECAP_AER+0x08.L=0x00000000
    • $ sudo setpci -s d7:0.0 ECAP_AER+0x14.L=0x00000000
  2. Reboot.

Related Information

Intel Acceleration Stack Quick Start Guide for Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA

Estimating the Worst-Case Core Static Power

Follow these steps to estimate the worst case static power

  1. Refer to the Minimal Flow Example to compile a sample AFU located at:
    • /hw/samples//
  2. In the Intel Quartus Prime Pro Edition software, click File > Open Project and select your .qpf file to open the AFU synthesis project from the following path:
    • /hw/samples//build_synth/build
  3. Click Project > Generate EPE File to create the required .csv file.
    • Step 2 Illustration
  4. Open the Early Power Estimator tool(5) and click Import CSV icon. Select the above generated .csv file.
    • Note: You can ignore the warning while importing the .csv file.
  5. Inputs parameters are filled out automatically.
  • Change the value to User Entered in the Junction Temp. TJ field. And set the Junction Temp. TJ (°C) field to 95
  • Change the Power Characteristics field from Typical to Maximum.
  • In the EPE Tool, the PSTATIC is the total static power in Watts. You can calculate the worst case core static power from the Report tab

EPE Tool Sample Output

AN-872 -Acceleration-Card-with-Intel-Arria-10-GX-FPGA-
fig-2

Report Tab

AN-872 -Acceleration-Card-with-Intel-Arria-10-GX-FPGA-
fig-3

In the example shown above, the total FPGA core static current is the sum of all static current and standby current at 0.9V (VCC, VCCP, VCCERAM). Enter these value in row Step 3: Worst static power from EPE of the Intel FPGA PAC Power Estimator Sheet. Observe the Calculated output row for the maximum power consumption of your AFU.

Document Revision History for Thermal and Power Guidelines for Intel PAC

with Intel Arria 10 GX FPGA

Document Version Changes
2019.08.30 Initial release.

Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current specifications in accordance with Intel’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.

Other names and brands may be claimed as the property of others.

ISO

  • 9001:2015
    Registered

ID: 683795
Version: 2019.08.30

References

Read User Manual Online (PDF format)

Loading......

Download This Manual (PDF format)

Download this manual  >>

Related Manuals