AMD MI200 Instinct Accelerator Instruction Manual
- July 31, 2024
- AMD
Table of Contents
MI200 Instinct Accelerator
“`html
Product Information
Specifications
-
Product Name: AMD Instinct MI200 Accelerator Firmware Update
Tool – AMD FW Flash -
Publication Number: 58083 v2.1
-
Date: June 2024
Product Usage Instructions
1. Getting Started
Ensure you have the AMD FW Flash tool v2.0 and the necessary
IFWI and RMFW versions.
2. Commands
Refer to the commands section for detailed instructions on using
the tool, including help options and listing devices.
2.1 Help
Use the help command to get assistance on how to use the AMD FW
Flash tool.
2.2 List Devices
Use this command to list the devices available for firmware
update.
3. Instructions
3.1 Configuring the System for FW Maintenance or AMD Instinct
MI200 Replacement
Follow the steps outlined in the user guide to configure the
system for firmware maintenance or GPU replacement.
3.2 Updating and Rolling Back the AMD Instinct MI200 FW
Version
Use the AMD FW Flash tool to update or rollback the IFWI and
RMFW versions as needed.
Frequently Asked Questions (FAQ)
Q: What versions of IFWI and RMFW are available with the AMD FW
Flash tool?
A: The AMD FW Flash tool v2.0 is delivered with four versions of
IFWI and RMFW for the AMD Instinct MI200 GPUs.
Q: Can I update my MI200 platform to a specific Maintenance
Update version?
A: Yes, the tool allows you to update your MI200 platform to
Maintenance Update#1 or Maintenance Update#2 versions from the GA
version.
“`
AMD InstinctTM MI200 Accelerator Firmware Update Tool – AMD FW Flash
Publication Number: 58083 v2.1 Date: June 2024
Contents
1Contes Introduction……………………………………………………………………………………………………………………………. 3
2 Getting Started………………………………………………………………………………………………………………………..4
3 Commands……………………………………………………………………………………………………………………………… 5 3.1
Help……………………………………………………………………………………………………………………………………………….5 3.2 List
Devices………………………………………………………………………………………………………………………………….. 8
4 Instructions…………………………………………………………………………………………………………………………….. 9 4.1
Configuring the System for FW Maintenance or AMD InstinctTM MI200
Replacement……………. 9
4.1.1 Installing the AMD FW Flash Tool………………………………………………………………………………………………..9
4.2 Updating and Rolling Back the AMD InstinctTM MI200 FW
Version………………………………………… 11
4.2.1 Updating the MI200 FW Maintenance Version…………………………………………………………………………11
4.2.2 Rolling Back to the MI200 GA FW Version……………………………………………………………………………….
12 4.3 Verifying the AMD InstinctTM MI200 FW
Version…………………………………………………………………….. 13 4.4 Uninstalling the AMD FW Flash
Tool…………………………………………………………………………………………14 4.5 Replacing the AMD InstinctTM
MI200 GPU (RMA)……………………………………………………………………. 14
5 References……………………………………………………………………………………………………………………………. 15
6 Customer Care………………………………………………………………………………………………………………………16
7 Frequently Asked Questions (FAQ)………………………………………………………………………………………17
A Notices…………………………………………………………………………………………………………………………………..18
58083 v2.1
1
June 2024
List of Figures
Figure 3.1: Figure 3.2: F igure 3.3:
SUDO/AMD FW Flash –help Generic Options…………………………………………………..6 SUDO/AMD FW Flash –help Common Tool Options……………………………………….. 7 SUDO/AMD FW Flash –list- devices………………………………………………………………….. 8
58083 v2.1
2
June 2024
Chapter 1 Introduction
Introduction
This document provides step-by-step instructions for updating the Integrated
Firmware Image (IFWI) and Remote Management Firmware (RMFW) using the AMD FW
Flash tool (amdfwflash) on the AMD InstinctTM MI200 server platforms.
This user guide is for users who have the following AMD InstinctTM MI200 GPUs
and wants to upgrade IFWI and/or RMFW.
· AMD InstinctTM MI210 · AMD InstinctTM MI250/MI250X
The AMD FW Flash tool v2.0 is delivered with four versions of IFWI and RMFW:
· Maintenance Update#1 (mu1) · Maintenance Update#2 (mu2) · Maintenance
Update#3 (mu3) · General Availability (GA) By default, the tool updates to the
most recent version of Maintenance Update#3.
The tool also offers the ability to update or rollback your IFWI and/or RMFW
to a desired level. For instance, this tool has the capability to update your
MI200 platform to Maintenance Update#1 or Maintenance Update#2 version from
the GA version. The steps to be followed are outlined in this document.
Note: The AMD FW Flash tool is not intended to be used in a Virtual
Machine/Guest Operating System (OS) environment.
CAUTION: Using the AMD FW Flash tool in a Virtual Machine/Guest OS may result
in an undefined behavior and unsupported configuration.
58083 v2.1
3
June 2024
Chapter 2 Getting Started
Getting Started
Prior to updating the FW, follow the instructions below: · Requires
installation of the dmidecode package on the system. This is applicable for
all systems
(Ubuntu/CentOS/RHEL/SLES). · Identify the server with the AMD InstinctTM MI200
accelerator(s) requiring a FW update or GPU
replacement. · Ensure that you have the appropriate login credentials for the
server.
Note: To execute the firmware update tool, you must have sudo or root
permissions on the server.
· To access the system console, make sure you have access to the BMC/IPMI
interface. · Ensure network access to the AMD FW Flash tool repository,
“repo.radeon.com”. · Ensure that all applications are closed prior to
launching the tool and that no Operating System
(OS) updates are pending in the background. Notify server users about the
server maintenance for firmware update. · RMFW updates require the driver to
be loaded.
Note: It is strongly recommended to run the firmware tool update from the
system console, and not on the network. This prevents any network interruption
and loss of connection.
58083 v2.1
4
June 2024
Chapter 3 Commands
The AMD FW Flash utility supports multiple flags and options to update the
FWs.
3.1 Help
Flag/Option –help/-h [switch] Description Displays the help text for all
switches along with the description of the tool. [switch] is optional. · When
[switch] is specified, the help for the specified switch is displayed. · When
[switch] is not specified, the complete help is displayed.
Commands
58083 v2.1
5
June 2024
Figure 3.1: SUDO/AMD FW Flash –help Generic Options
Commands
58083 v2.1
6
June 2024
Figure 3.2: SUDO/AMD FW Flash –help Common Tool Options
Commands
58083 v2.1
7
June 2024
Commands
3.2 List Devices
Flag/Option –list-devices/-l Description This command performs the following
functions: · Informs the tool to show the available ASICs along with the
SPIROM model and respective part
numbers. · Indicates whether the firmware update is available or not. · When
the tool is executed without a command line, the switches display the devices
by default. The following figure lists the dGPU device information whether an
appropriate firmware update is available or not. Figure 3.3: SUDO/AMD FW Flash
–list-devices
58083 v2.1
8
June 2024
Chapter 4 Instructions
Instructions
To update the FW on AMD InstinctTM MI200 Accelerator(s) or when replacing the
AMD InstinctTM MI200 Accelerator(s) on a server, configure the system for the
FW maintenance. Once the system is configured for firmware maintenance,
execute the amdfwflash command to update or rollback the FW to a desired
version.
4.1 Configuring the System for FW Maintenance or AMD InstinctTM MI200
Replacement
4.1.1 Installing the AMD FW Flash Tool
1. The AMD FW Flash tool repository for Linux is located at:
(repo.radeon.com/fwupdater/ amdfwflash/latest).
2. Log in to the server with the MI200 GPUs requiring a FW update.
$ ssh user@mi200_server
3. Setup the AMD FW Flash tool package repository.
Setup Ubuntu OS apt repo
wget -q -O – https://repo.radeon.com/fwupdater/amdfw.gpg.key | sudo apt-key
add echo ‘deb [arch=amd64]
https://repo.radeon.com/fwupdater/amdfwflash/latest/deb/ ubuntu main’ | sudo
tee /etc/apt/sources.list.d/ amdfwflash.list
Setup RHEL 8 or RHEL 9 yum repo
echo -e
‘[amdfwflash]nname=amdfwflashnenabled=1nautorefresh=0ngpgkey=https://repo.radeon.com/fwupdater/
amdfw.gpg.keynbaseurl=https://repo.radeon.com/fwupdater/amdfwflash/latest/rpmngpgcheck=1’
| sudo tee /etc/ yum.repos.d/amdfwflash.repo
Setup SLES 15 SP3 or SP4 zypper repo
echo -e
‘[amdfwflash]nenabled=1nautorefresh=0ngpgkey=https://repo.radeon.com/fwupdater/amdfw.gpg.key
nbaseurl=https://repo.radeon.com/fwupdater/amdfwflash/latest/rpmntype=rpm-
mdngpgcheck=1’ | sudo tee /etc/zypp/ repos.d/amdfwflash.repo
4. Update the AMD FW Flash tool package repository.
Ubuntu OS
sudo apt update
To verify, search for the amdfwflash package:
58083 v2.1
9
June 2024
Instructions
sudo apt search amdfwflash
RHEL 8 or RHEL 9
sudo yum update
To verify, search for the amdfwflash package:
sudo yum search amdfwflash
SLES 15 SP3 or SP4
sudo zypper update
To verify, search for the amdfwflash package:
sudo zypper search amdfwflash
5. Install the AMD FW Flash tool package. Ubuntu OS
sudo apt install amdfwflash
RHEL 8 or RHEL 9
sudo yum install amdfwflash
SLES 15 SP3 or SP4 Prior to installing set iomem=relaxed in the grub and
remake the kernel config.
sudo sed -i
‘s/^GRUB_CMDLINE_LINUX_DEFAULT=”/GRUB_CMDLINE_LINUX_DEFAULT=”iomem=relaxed /’
/etc/default/grub sudo grub2-mkconfig -o /boot/grub2/grub.cfg sudo
/usr/bin/dracut –force -regenerate-all reboot sudo zypper install amdfwflash
6. Verify the AMD FW Flash tool package installation. Ubuntu OS
dpkg -l | grep amdfwflash
RHEL 8, RHEL 9
rpm -qa | grep amdfwflash
SLES 15 SP3, or SLES 15 SP4
rpm -qa | grep amdfwflash
7. Reboot the server for FW maintenance update or power off to replace the
MI200 GPUs.
sudo reboot
or
sudo poweroff
58083 v2.1
10
June 2024
Instructions
Note: If there is a replacement of the AMD InstinctTM MI200 Accelerator in the
system, power off the system.
Refer to the section Updating and Rolling Back the AMD InstinctTM MI200 FW
Version to update or rollback the AMD InstinctTM MI200 FW to a desired
version.
4.2 Updating and Rolling Back the AMD InstinctTM MI200 FW Version
Follow the below steps to update or rollback the AMD InstinctTM MI200 FW to a
desired version.
4.2.1 Updating the MI200 FW Maintenance Version
1. Log in to the server’s BMC/IPMI interface identified for FW update. 2.
Launch the remote/virtual console on the server. 3. Log in to the server. 4.
Run the amdfwflash utility to list the GPU devices.
sudo /opt/amdfwflash/sbin/amdfwflash –list-devices
Note: The output should list all the GPU devices in the system. If the output
does not list all the GPU devices, contact customer care (Customer Care). 5.
Execute the amdfwflash command to update the IFWI and/or RMFW of all GPUs in
the system to the latest MI200 Maintenance Update#3 version.
sudo /opt/amdfwflash/sbin/amdfwflash –update-ifwi
or
sudo /opt/amdfwflash/sbin/amdfwflash –update-ifwi mu3 sudo
/opt/amdfwflash/sbin/amdfwflash –update-rmfw
or
sudo /opt/amdfwflash/sbin/amdfwflash –update-rmfw mu3
6. Follow this step to update the IFWI and/or RMFW of all GPUs in the system
to the MI200 Maintenance Update#2 version.
sudo /opt/amdfwflash/sbin/amdfwflash –update-ifwi mu2 sudo
/opt/amdfwflash/sbin/amdfwflash –update-rmfw mu2
58083 v2.1
11
June 2024
Instructions
7. Follow this step to update the IFWI and/or RMFW of all GPUs in the system
to the MI200 Maintenance Update#1 version.
sudo /opt/amdfwflash/sbin/amdfwflash –update-ifwi mu1 sudo
/opt/amdfwflash/sbin/amdfwflash –update-rmfw mu1
8. Save the system log and console output to a file. 9. The amdfwflash tool
saves a copy of the old IFWI and/or RMFW images under /tmp before updating.
Archive the generated FW images from /tmp folder for later reference.
tar cvf ifwi-backup.tar /tmp/amdfwflash/ifwi/backup tar cvf rmfw-backup.tar
/tmp/amdfwflash/rmfw/backup
10. Reboot the server (an AC power cycle is recommended) to make the FW
update effective.
sudo reboot
or
sudo ipmitool power cycle
11. Refer to the section Verifying the AMD InstinctTM MI200 FW Version to
complete the FW update. After a successful verification of the FW update, the
server may resume normal operation.
4.2.2 Rolling Back to the MI200 GA FW Version
1. Log in to the server’s BMC/IPMI interface identified for FW update. 2.
Launch the remote/virtual console on the server. 3. Log in to the server. 4.
Run the amdfwflash utility to list the GPU devices.
sudo /opt/amdfwflash/sbin/amdfwflash –list-devices
Note: The output should list all the GPU devices in the system. If the output
does not list all the GPU devices, contact customer care (Customer Care).
5. Execute the amdfwflash command to rollback the IFWI and/or RMFW of all
GPUs to the GA version.
sudo /opt/amdfwflash/sbin/amdfwflash –rollback-ifwi sudo
/opt/amdfwflash/sbin/amdfwflash –rollback-rmfw
6. Run the amdfwflash to rollback the IFWI and/or RMFW of all GPUs to the
Maintenance Update#2 version from Maintenance Update#3 version.
sudo /opt/amdfwflash/sbin/amdfwflash –rollback-ifwi mu2 sudo
/opt/amdfwflash/sbin/amdfwflash –rollback-rmfw mu2
7. Run the amdfwflash to rollback the IFWI and/or RMFW of all GPUs to the
Maintenance Update#1
58083 v2.1
12
June 2024
Instructions
version from Maintenance Update#2 version.
sudo /opt/amdfwflash/sbin/amdfwflash –rollback-ifwi mu1 sudo
/opt/amdfwflash/sbin/amdfwflash –rollback-rmfw mu1
8. Save the system log and console output to a file. 9. The amdfwflash tool
saves a copy of the old IFWI and/or RMFW images under /tmp before updating.
Archive the generated FW images from /tmp folder for later reference.
tar cvf ifwi-backup.tar /tmp/amdfwflash/ifwi/backup tar cvf rmfw-backup.tar
/tmp/amdfwflash/rmfw/backup
10. Reboot the server (an AC power cycle is recommended) to make the FW
update effective.
sudo reboot
or
sudo ipmitool power cycle
11. Refer to the section Verifying the AMD InstinctTM MI200 FW Version to
complete the FW update. After a successful verification of the FW update, the
server may resume normal operation.
4.3 Verifying the AMD InstinctTM MI200 FW Version
1. Log in to the system. 2. If the AMD ROCm software is installed, run the
showhw command to display the firmware version
under VBIOS column. The output should list all the GPU devices in the system.
If the output does not list all the GPU devices, contact customer care
(Customer Care).
/opt/rocm/bin/rocm-smi –showhw
Note: If your environment has blacklisted the amdgpu driver for normal
operation, run the following command to load the driver before executing rocm-
smi.
sudo modprobe amdgpu
3. Run the amdfwflash utility to list all the GPU devices.
sudo /opt/amdfwflash/sbin/amdfwflash –list-devices
Note: Please refer to the command (List Devices) section.
4. Ensure that all MI200 GPUs have the same updated IFWI and RMFW versions.
Note: In the event of a console output error, contact customer care (Customer
Care).
After a successful verification of the FW update, the server may resume normal
operation.
58083 v2.1
13
June 2024
Instructions
4.4 Uninstalling the AMD FW Flash Tool
1. Uninstall the AMD FW Flash amdfwflash tool package. Ubuntu OS
sudo apt remove amdfwflash
RHEL 8 or RHEL 9
sudo yum remove amdfwflash
SLES15 SP3 or SP4
sudo zypper rm amdfwflash
4.5 Replacing the AMD InstinctTM MI200 GPU (RMA)
The IFWI and RMFW versions of all AMD InstinctTM MI200 Accelerators within a
system must be identical for the system to work properly. 1. When replacing
the AMD InstinctTM MI200 Accelerator(s) in a system, the system must be
configured
for the AMD InstinctTM MI200 Replacement. Refer to the section Configuring the
System for FW Maintenance or AMD InstinctTM MI200 Replacement for steps on how
to configure the system. 2. Once the system is configured for the AMD
InstinctTM MI200 replacement, power off the system and replace the AMD
InstinctTM MI200 Accelerator(s) according to the assembly instruction manual.
3. After replacing the AMD InstinctTM MI200 Accelerator, power on the system
and follow the steps in Updating and Rolling Back the AMD InstinctTM MI200 FW
Version to update or rollback the IFWI and/or RMFW on all AMD InstinctTM MI200
Accelerator(s) to a desired version.
58083 v2.1
14
June 2024
References
Chapter 5 References
For additional information, please refer to the following web sites: · System
Administration Guide: https://documentation.suse.com/sles/15-SP4/html/SLES-
all/cha-
mod.html · Knowledge-base site: https://access.redhat.com/solutions/41278
58083 v2.1
15
June 2024
Customer Care
Chapter 6 Customer Care
If you have any questions or need additional information, please contact your
AMD Representative. You may also submit a question at Online Service Request
(https://www.amd.com/en/support/contacte mail-form) using the keyword
amdfwflash in the subject line.
58083 v2.1
16
June 2024
Frequently Asked Questions (FAQ)
Chapter 7 Frequently Asked Questions (FAQ)
1. Q: Can I use the AMD FW Flash tool with the amdgpu driver loaded? A: Yes.
From version 2.00 of the tool onwards, the amdgpu driver can remain loaded.
2. Q: Can the GPU cards of the same hive (with XGMI/ Infinity Fabric) have
different firmware versions? A: No. This configuration is not supported and
may cause undefined behavior. For more information, please refer to the
Instructions.
3. Q: Does the message ERROR:VBIOS image already flashed indicate an error
when the rollbackifwi option is used to update the IFWIs in all GPUs to GA
version?
ERROR: VBIOS image already flashed
A: No. The message does not indicate an error. 4. Q: What is GA version?
A: GA version refers to the IFWI and RMFW shipped from the factory. 5. Q: What
is Return Merchandise Authorization (RMA)?
A: RMA means adding a new card into a system that already contains existing
cards. This may include field replacements or adding additional GPUs to a
server. 6. Q: Does the message from rocm-smi command after the IFWI update
indicate an error?
WARNING: No AMD GPUs specified
A: No. Please ensure that the amdgpu driver is installed for the booted
kernel. Verify that the output of dkms status and uname -a have the same
kernel versions. Otherwise, please boot the correct kernel with the amdgpu
driver installed.
58083 v2.1
17
June 2024
Notices
Appendix A Notices
© Copyright 2024 Advanced Micro Devices, Inc.
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors.
The information contained herein is subject to change and may be rendered
inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product
releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. Any computer system has
risks of security vulnerabilities that cannot be completely prevented or
mitigated. AMD assumes no obligation to update or otherwise correct or revise
this information. However, AMD reserves the right to revise this information
and to make changes from time to time to the content hereof without obligation
of AMD to notify any person of such revisions or changes.
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR
WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY
FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-
INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO
EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT,
SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY
INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
A.1 Trademarks
AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced
Micro Devices, Inc.
Other product names used in this publication are for identification purposes
only and may be trademarks o f their respective companies.
58083 v2.1
18
June 2024
References
- Index of /
- Index of /fwupdater/
- How do I prevent a kernel module from loading automatically? - Red Hat Customer Portal
- Index of /fwupdater/
- repo.radeon.com/fwupdater/amdfw.gpg.key
- Index of /fwupdater/amdfwflash/latest/deb/
Read User Manual Online (PDF format)
Read User Manual Online (PDF format) >>