AMD MI200 Instinct Accelerator Instruction Manual

July 31, 2024
AMD

MI200 Instinct Accelerator

“`html

Product Information

Specifications

  • Product Name: AMD Instinct MI200 Accelerator Firmware Update
    Tool – AMD FW Flash

  • Publication Number: 58083 v2.1

  • Date: June 2024

Product Usage Instructions

1. Getting Started

Ensure you have the AMD FW Flash tool v2.0 and the necessary
IFWI and RMFW versions.

2. Commands

Refer to the commands section for detailed instructions on using
the tool, including help options and listing devices.

2.1 Help

Use the help command to get assistance on how to use the AMD FW
Flash tool.

2.2 List Devices

Use this command to list the devices available for firmware
update.

3. Instructions

3.1 Configuring the System for FW Maintenance or AMD Instinct

MI200 Replacement

Follow the steps outlined in the user guide to configure the
system for firmware maintenance or GPU replacement.

3.2 Updating and Rolling Back the AMD Instinct MI200 FW

Version

Use the AMD FW Flash tool to update or rollback the IFWI and
RMFW versions as needed.

Frequently Asked Questions (FAQ)

Q: What versions of IFWI and RMFW are available with the AMD FW

Flash tool?

A: The AMD FW Flash tool v2.0 is delivered with four versions of
IFWI and RMFW for the AMD Instinct MI200 GPUs.

Q: Can I update my MI200 platform to a specific Maintenance

Update version?

A: Yes, the tool allows you to update your MI200 platform to
Maintenance Update#1 or Maintenance Update#2 versions from the GA
version.

“`

AMD InstinctTM MI200 Accelerator Firmware Update Tool – AMD FW Flash
Publication Number: 58083 v2.1 Date: June 2024

Contents
1Contes Introduction……………………………………………………………………………………………………………………………. 3
2 Getting Started………………………………………………………………………………………………………………………..4
3 Commands……………………………………………………………………………………………………………………………… 5 3.1 Help……………………………………………………………………………………………………………………………………………….5 3.2 List Devices………………………………………………………………………………………………………………………………….. 8
4 Instructions…………………………………………………………………………………………………………………………….. 9 4.1 Configuring the System for FW Maintenance or AMD InstinctTM MI200 Replacement……………. 9
4.1.1 Installing the AMD FW Flash Tool………………………………………………………………………………………………..9 4.2 Updating and Rolling Back the AMD InstinctTM MI200 FW Version………………………………………… 11
4.2.1 Updating the MI200 FW Maintenance Version…………………………………………………………………………11 4.2.2 Rolling Back to the MI200 GA FW Version………………………………………………………………………………. 12 4.3 Verifying the AMD InstinctTM MI200 FW Version…………………………………………………………………….. 13 4.4 Uninstalling the AMD FW Flash Tool…………………………………………………………………………………………14 4.5 Replacing the AMD InstinctTM MI200 GPU (RMA)……………………………………………………………………. 14
5 References……………………………………………………………………………………………………………………………. 15
6 Customer Care………………………………………………………………………………………………………………………16
7 Frequently Asked Questions (FAQ)………………………………………………………………………………………17
A Notices…………………………………………………………………………………………………………………………………..18

58083 v2.1

1

June 2024

List of Figures

Figure 3.1: Figure 3.2: F igure 3.3:

SUDO/AMD FW Flash –help Generic Options…………………………………………………..6 SUDO/AMD FW Flash –help Common Tool Options……………………………………….. 7 SUDO/AMD FW Flash –list- devices………………………………………………………………….. 8

58083 v2.1

2

June 2024

Chapter 1 Introduction

Introduction

This document provides step-by-step instructions for updating the Integrated Firmware Image (IFWI) and Remote Management Firmware (RMFW) using the AMD FW Flash tool (amdfwflash) on the AMD InstinctTM MI200 server platforms.
This user guide is for users who have the following AMD InstinctTM MI200 GPUs and wants to upgrade IFWI and/or RMFW.
· AMD InstinctTM MI210 · AMD InstinctTM MI250/MI250X
The AMD FW Flash tool v2.0 is delivered with four versions of IFWI and RMFW:
· Maintenance Update#1 (mu1) · Maintenance Update#2 (mu2) · Maintenance Update#3 (mu3) · General Availability (GA) By default, the tool updates to the most recent version of Maintenance Update#3.
The tool also offers the ability to update or rollback your IFWI and/or RMFW to a desired level. For instance, this tool has the capability to update your MI200 platform to Maintenance Update#1 or Maintenance Update#2 version from the GA version. The steps to be followed are outlined in this document.
Note: The AMD FW Flash tool is not intended to be used in a Virtual Machine/Guest Operating System (OS) environment.
CAUTION: Using the AMD FW Flash tool in a Virtual Machine/Guest OS may result in an undefined behavior and unsupported configuration.

58083 v2.1

3

June 2024

Chapter 2 Getting Started

Getting Started

Prior to updating the FW, follow the instructions below: · Requires installation of the dmidecode package on the system. This is applicable for all systems
(Ubuntu/CentOS/RHEL/SLES). · Identify the server with the AMD InstinctTM MI200 accelerator(s) requiring a FW update or GPU
replacement. · Ensure that you have the appropriate login credentials for the server.
Note: To execute the firmware update tool, you must have sudo or root permissions on the server.
· To access the system console, make sure you have access to the BMC/IPMI interface. · Ensure network access to the AMD FW Flash tool repository, “repo.radeon.com”. · Ensure that all applications are closed prior to launching the tool and that no Operating System
(OS) updates are pending in the background. Notify server users about the server maintenance for firmware update. · RMFW updates require the driver to be loaded.
Note: It is strongly recommended to run the firmware tool update from the system console, and not on the network. This prevents any network interruption and loss of connection.

58083 v2.1

4

June 2024

Chapter 3 Commands
The AMD FW Flash utility supports multiple flags and options to update the FWs.
3.1 Help
Flag/Option –help/-h [switch] Description Displays the help text for all switches along with the description of the tool. [switch] is optional. · When [switch] is specified, the help for the specified switch is displayed. · When [switch] is not specified, the complete help is displayed.

Commands

58083 v2.1

5

June 2024

Figure 3.1: SUDO/AMD FW Flash –help Generic Options

Commands

58083 v2.1

6

June 2024

Figure 3.2: SUDO/AMD FW Flash –help Common Tool Options

Commands

58083 v2.1

7

June 2024

Commands
3.2 List Devices
Flag/Option –list-devices/-l Description This command performs the following functions: · Informs the tool to show the available ASICs along with the SPIROM model and respective part
numbers. · Indicates whether the firmware update is available or not. · When the tool is executed without a command line, the switches display the devices by default. The following figure lists the dGPU device information whether an appropriate firmware update is available or not. Figure 3.3: SUDO/AMD FW Flash –list-devices

58083 v2.1

8

June 2024

Chapter 4 Instructions

Instructions

To update the FW on AMD InstinctTM MI200 Accelerator(s) or when replacing the AMD InstinctTM MI200 Accelerator(s) on a server, configure the system for the FW maintenance. Once the system is configured for firmware maintenance, execute the amdfwflash command to update or rollback the FW to a desired version.
4.1 Configuring the System for FW Maintenance or AMD InstinctTM MI200 Replacement

4.1.1 Installing the AMD FW Flash Tool
1. The AMD FW Flash tool repository for Linux is located at: (repo.radeon.com/fwupdater/ amdfwflash/latest).
2. Log in to the server with the MI200 GPUs requiring a FW update.
$ ssh user@mi200_server
3. Setup the AMD FW Flash tool package repository.
Setup Ubuntu OS apt repo
wget -q -O – https://repo.radeon.com/fwupdater/amdfw.gpg.key | sudo apt-key add echo ‘deb [arch=amd64] https://repo.radeon.com/fwupdater/amdfwflash/latest/deb/ ubuntu main’ | sudo tee /etc/apt/sources.list.d/ amdfwflash.list
Setup RHEL 8 or RHEL 9 yum repo
echo -e ‘[amdfwflash]nname=amdfwflashnenabled=1nautorefresh=0ngpgkey=https://repo.radeon.com/fwupdater/ amdfw.gpg.keynbaseurl=https://repo.radeon.com/fwupdater/amdfwflash/latest/rpmngpgcheck=1’ | sudo tee /etc/ yum.repos.d/amdfwflash.repo
Setup SLES 15 SP3 or SP4 zypper repo
echo -e ‘[amdfwflash]nenabled=1nautorefresh=0ngpgkey=https://repo.radeon.com/fwupdater/amdfw.gpg.key nbaseurl=https://repo.radeon.com/fwupdater/amdfwflash/latest/rpmntype=rpm- mdngpgcheck=1’ | sudo tee /etc/zypp/ repos.d/amdfwflash.repo
4. Update the AMD FW Flash tool package repository.
Ubuntu OS
sudo apt update
To verify, search for the amdfwflash package:

58083 v2.1

9

June 2024

Instructions
sudo apt search amdfwflash
RHEL 8 or RHEL 9
sudo yum update
To verify, search for the amdfwflash package:
sudo yum search amdfwflash
SLES 15 SP3 or SP4
sudo zypper update
To verify, search for the amdfwflash package:
sudo zypper search amdfwflash
5. Install the AMD FW Flash tool package. Ubuntu OS
sudo apt install amdfwflash
RHEL 8 or RHEL 9
sudo yum install amdfwflash
SLES 15 SP3 or SP4 Prior to installing set iomem=relaxed in the grub and remake the kernel config.
sudo sed -i ‘s/^GRUB_CMDLINE_LINUX_DEFAULT=”/GRUB_CMDLINE_LINUX_DEFAULT=”iomem=relaxed /’ /etc/default/grub sudo grub2-mkconfig -o /boot/grub2/grub.cfg sudo /usr/bin/dracut –force -­regenerate-all reboot sudo zypper install amdfwflash
6. Verify the AMD FW Flash tool package installation. Ubuntu OS
dpkg -l | grep amdfwflash
RHEL 8, RHEL 9
rpm -qa | grep amdfwflash
SLES 15 SP3, or SLES 15 SP4
rpm -qa | grep amdfwflash
7. Reboot the server for FW maintenance update or power off to replace the MI200 GPUs.
sudo reboot
or
sudo poweroff

58083 v2.1

10

June 2024

Instructions
Note: If there is a replacement of the AMD InstinctTM MI200 Accelerator in the system, power off the system.
Refer to the section Updating and Rolling Back the AMD InstinctTM MI200 FW Version to update or rollback the AMD InstinctTM MI200 FW to a desired version.
4.2 Updating and Rolling Back the AMD InstinctTM MI200 FW Version
Follow the below steps to update or rollback the AMD InstinctTM MI200 FW to a desired version.
4.2.1 Updating the MI200 FW Maintenance Version
1. Log in to the server’s BMC/IPMI interface identified for FW update. 2. Launch the remote/virtual console on the server. 3. Log in to the server. 4. Run the amdfwflash utility to list the GPU devices.
sudo /opt/amdfwflash/sbin/amdfwflash –list-devices
Note: The output should list all the GPU devices in the system. If the output does not list all the GPU devices, contact customer care (Customer Care). 5. Execute the amdfwflash command to update the IFWI and/or RMFW of all GPUs in the system to the latest MI200 Maintenance Update#3 version.
sudo /opt/amdfwflash/sbin/amdfwflash –update-ifwi
or
sudo /opt/amdfwflash/sbin/amdfwflash –update-ifwi mu3 sudo /opt/amdfwflash/sbin/amdfwflash –update-rmfw
or
sudo /opt/amdfwflash/sbin/amdfwflash –update-rmfw mu3
6. Follow this step to update the IFWI and/or RMFW of all GPUs in the system to the MI200 Maintenance Update#2 version.
sudo /opt/amdfwflash/sbin/amdfwflash –update-ifwi mu2 sudo /opt/amdfwflash/sbin/amdfwflash –update-rmfw mu2

58083 v2.1

11

June 2024

Instructions

7. Follow this step to update the IFWI and/or RMFW of all GPUs in the system to the MI200 Maintenance Update#1 version.
sudo /opt/amdfwflash/sbin/amdfwflash –update-ifwi mu1 sudo /opt/amdfwflash/sbin/amdfwflash –update-rmfw mu1
8. Save the system log and console output to a file. 9. The amdfwflash tool saves a copy of the old IFWI and/or RMFW images under /tmp before updating.
Archive the generated FW images from /tmp folder for later reference.
tar cvf ifwi-backup.tar /tmp/amdfwflash/ifwi/backup tar cvf rmfw-backup.tar /tmp/amdfwflash/rmfw/backup
10. Reboot the server (an AC power cycle is recommended) to make the FW update effective.
sudo reboot
or
sudo ipmitool power cycle
11. Refer to the section Verifying the AMD InstinctTM MI200 FW Version to complete the FW update. After a successful verification of the FW update, the server may resume normal operation.

4.2.2 Rolling Back to the MI200 GA FW Version
1. Log in to the server’s BMC/IPMI interface identified for FW update. 2. Launch the remote/virtual console on the server. 3. Log in to the server. 4. Run the amdfwflash utility to list the GPU devices.
sudo /opt/amdfwflash/sbin/amdfwflash –list-devices

Note: The output should list all the GPU devices in the system. If the output does not list all the GPU devices, contact customer care (Customer Care).
5. Execute the amdfwflash command to rollback the IFWI and/or RMFW of all GPUs to the GA version.
sudo /opt/amdfwflash/sbin/amdfwflash –rollback-ifwi sudo /opt/amdfwflash/sbin/amdfwflash –rollback-rmfw
6. Run the amdfwflash to rollback the IFWI and/or RMFW of all GPUs to the Maintenance Update#2 version from Maintenance Update#3 version.
sudo /opt/amdfwflash/sbin/amdfwflash –rollback-ifwi mu2 sudo /opt/amdfwflash/sbin/amdfwflash –rollback-rmfw mu2
7. Run the amdfwflash to rollback the IFWI and/or RMFW of all GPUs to the Maintenance Update#1

58083 v2.1

12

June 2024

Instructions
version from Maintenance Update#2 version.
sudo /opt/amdfwflash/sbin/amdfwflash –rollback-ifwi mu1 sudo /opt/amdfwflash/sbin/amdfwflash –rollback-rmfw mu1
8. Save the system log and console output to a file. 9. The amdfwflash tool saves a copy of the old IFWI and/or RMFW images under /tmp before updating.
Archive the generated FW images from /tmp folder for later reference.
tar cvf ifwi-backup.tar /tmp/amdfwflash/ifwi/backup tar cvf rmfw-backup.tar /tmp/amdfwflash/rmfw/backup
10. Reboot the server (an AC power cycle is recommended) to make the FW update effective.
sudo reboot
or
sudo ipmitool power cycle
11. Refer to the section Verifying the AMD InstinctTM MI200 FW Version to complete the FW update. After a successful verification of the FW update, the server may resume normal operation.
4.3 Verifying the AMD InstinctTM MI200 FW Version
1. Log in to the system. 2. If the AMD ROCm software is installed, run the showhw command to display the firmware version
under VBIOS column. The output should list all the GPU devices in the system. If the output does not list all the GPU devices, contact customer care (Customer Care).
/opt/rocm/bin/rocm-smi –showhw
Note: If your environment has blacklisted the amdgpu driver for normal operation, run the following command to load the driver before executing rocm- smi.
sudo modprobe amdgpu
3. Run the amdfwflash utility to list all the GPU devices.
sudo /opt/amdfwflash/sbin/amdfwflash –list-devices
Note: Please refer to the command (List Devices) section.
4. Ensure that all MI200 GPUs have the same updated IFWI and RMFW versions.
Note: In the event of a console output error, contact customer care (Customer Care).
After a successful verification of the FW update, the server may resume normal operation.

58083 v2.1

13

June 2024

Instructions
4.4 Uninstalling the AMD FW Flash Tool
1. Uninstall the AMD FW Flash amdfwflash tool package. Ubuntu OS
sudo apt remove amdfwflash
RHEL 8 or RHEL 9
sudo yum remove amdfwflash
SLES15 SP3 or SP4
sudo zypper rm amdfwflash
4.5 Replacing the AMD InstinctTM MI200 GPU (RMA)
The IFWI and RMFW versions of all AMD InstinctTM MI200 Accelerators within a system must be identical for the system to work properly. 1. When replacing the AMD InstinctTM MI200 Accelerator(s) in a system, the system must be configured
for the AMD InstinctTM MI200 Replacement. Refer to the section Configuring the System for FW Maintenance or AMD InstinctTM MI200 Replacement for steps on how to configure the system. 2. Once the system is configured for the AMD InstinctTM MI200 replacement, power off the system and replace the AMD InstinctTM MI200 Accelerator(s) according to the assembly instruction manual. 3. After replacing the AMD InstinctTM MI200 Accelerator, power on the system and follow the steps in Updating and Rolling Back the AMD InstinctTM MI200 FW Version to update or rollback the IFWI and/or RMFW on all AMD InstinctTM MI200 Accelerator(s) to a desired version.

58083 v2.1

14

June 2024

References
Chapter 5 References
For additional information, please refer to the following web sites: · System Administration Guide: https://documentation.suse.com/sles/15-SP4/html/SLES- all/cha-
mod.html · Knowledge-base site: https://access.redhat.com/solutions/41278

58083 v2.1

15

June 2024

Customer Care
Chapter 6 Customer Care
If you have any questions or need additional information, please contact your AMD Representative. You may also submit a question at Online Service Request (https://www.amd.com/en/support/contacte mail-form) using the keyword amdfwflash in the subject line.

58083 v2.1

16

June 2024

Frequently Asked Questions (FAQ)
Chapter 7 Frequently Asked Questions (FAQ)
1. Q: Can I use the AMD FW Flash tool with the amdgpu driver loaded? A: Yes. From version 2.00 of the tool onwards, the amdgpu driver can remain loaded.
2. Q: Can the GPU cards of the same hive (with XGMI/ Infinity Fabric) have different firmware versions? A: No. This configuration is not supported and may cause undefined behavior. For more information, please refer to the Instructions.
3. Q: Does the message ERROR:VBIOS image already flashed indicate an error when the ­rollbackifwi option is used to update the IFWIs in all GPUs to GA version?
ERROR: VBIOS image already flashed
A: No. The message does not indicate an error. 4. Q: What is GA version?
A: GA version refers to the IFWI and RMFW shipped from the factory. 5. Q: What is Return Merchandise Authorization (RMA)?
A: RMA means adding a new card into a system that already contains existing cards. This may include field replacements or adding additional GPUs to a server. 6. Q: Does the message from rocm-smi command after the IFWI update indicate an error?
WARNING: No AMD GPUs specified
A: No. Please ensure that the amdgpu driver is installed for the booted kernel. Verify that the output of dkms status and uname -a have the same kernel versions. Otherwise, please boot the correct kernel with the amdgpu driver installed.

58083 v2.1

17

June 2024

Notices
Appendix A Notices
© Copyright 2024 Advanced Micro Devices, Inc.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON- INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
A.1 Trademarks
AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Other product names used in this publication are for identification purposes only and may be trademarks o f their respective companies.

58083 v2.1

18

June 2024

References

Read User Manual Online (PDF format)

Loading......

Download This Manual (PDF format)

Download this manual  >>

Related Manuals