Cambricon MLU-X1001 Accelerator Construction Unit of Artificial Intelligence Supercomputing User Manual
- June 12, 2024
- Cambricon
Table of Contents
MLU-X1001 Accelerator Construction Unit of Artificial Intelligence
Supercomputing
User Manual
MLU-X1001 Accelerator
Product Manual
V0.9.3
Preface
1.1. Copyright Declaration
Disclaimer
Cambricon Technologies Corporation Limited(hereinafter referred to as
“Cambricon “) does not represent, guarantee (express, implied or statutory) or
guarantee the information contained in this document and expressly waives any
and all implied guarantees of saleability, ownership, non-aggression of
intellectual property or applicability for a specific purpose, and cambricon
does not assume any liability arising from the application or use of any
product or service. cambricon shall not be liable for any breach of contract,
damages, costs or problems arising from :(1) any way of using cambricon
products contrary to this Guide; or (2) customer product design.
Limitation of liability
In no case shall Cambricon be liable for any damage caused by the use or
inability to use this Guide (including but not limited to damage such as loss
of profits, business disruption and loss of information), even if Cambricon
has been advised that such damage may be suffered. Although the customer may
suffer any damage for any reason, according to the terms and conditions of
sale of the products of the Cambricon, the total and cumulative liability of
Cambricon to the customer for the products described in this Guide shall be
limited.
Accuracy of information
The information provided in this document is owned by Cambricon and Cambricon
reserves the right to make any changes to this document information or to any
products and services without notice. The information contained in this guide
and all other information of the Cambricon documents cited in this guide are
provided “as is “. Cambricon does not guarantee the accuracy or completeness
of information, texts, patterns, links or other items contained in this guide.
Cambricon may make changes to this Guide or to the products described in this
Guide without notice, but does not undertake to update this Guide.
The performance tests and grades listed in this guide are to be measured using
a specific chip or computer system or component. After such tests, the results
shown in this guide reflect the general performance of Cambricon products. Any
difference in system hardware or software design or configuration will affect
actual performance. As mentioned above, Cambricon does not represent, warrant
or guarantee that the products described in this Guide will apply for any
particular purpose.Cambricon does not represent or guarantee testing all
parameters of each product.The customer is solely responsible for ensuring
that the product is suitable and applicable to the application of the customer
plan and for performing the necessary tests on the application, with a view to
avoiding the default of the application or product.
The fragility of customer product design can affect the quality and
reliability of Cambricon products and lead to additional or different
circumstances and/or requirements beyond the scope of this guide.
Notice of Intellectual Property
The Cambricon and Cambricon symbols are trademarks and/or registered
trademarks of Cambricon Technologies Corporation Limited in the United States
and other countries. Other companies and product names shall be trademarks of
the respective companies associated with them.
This guide is copyrighted and protected by the provisions of copyright laws
and treaties worldwide.This guide can not be reproduced, reworked, modified,
published, uploaded, published, transmitted or distributed in any way without
the prior written permission of Cambricon. Except for the customer’s right to
use this guide information and products, according to this guide, Cambricon
does not grant any other express or implied rights or permits.
It is doubtful that the Cambricon does not grant any (express or implied)
rights or permits to the customer based on any patent, copyright, trademark,
trade secret or any other Cambricon intellectual property or ownership.
Copyright Declaration
© Cambricon Technologies Corporation Limited reserves all rights.
1.2. Versioning
Table 1.1 Version Record
Document name | MLU-X1001 accelerator Product Manual |
---|---|
Version number | V0.9.3 |
Author | Cambricon |
Date created | 2020.10.30 |
1.3. Update history
V0.2.0
Update time: 2020.07.10
Update:
– Initial version.
V0.93
Update time: 2020.10.30
Update:
– Modify the external interconnection name as MLU-Link, update the HBM rate,
and add warning of the button battery.
Overview
MLU-X1001 accelerator is a construction unit of artificial intelligence supercomputing. The extender inherits 4 MLU290-M5 intelligent accelerating cards, and provides up to 2 POPs of adaptive precision computing power. The supercomputing system from 4 cards to 16 cards is constructed by using the Cambrian MLU-LINK inter chip direct connection technology, which provides a highly agile, highly reliable and high-performance computing foundation for the Artificial Intelligence Computing Center.
Product Specification Overview
3.1 Overview of Product Specification Parameters
MLU-X1001 accelerator Specification Parameters are as follows :
Table 3.1 MLU-X1001 Specification Parameters
Specification indicators | Note |
---|---|
Model | MLU-X1001 |
Core architecture | Cambricon MLUv02 |
Core frequency | 1.3GHz |
Calculation accuracy support | INT16,INT8,INT4,FP32,FP16 |
Video decoding | Support |
Memory capacity | 192GB |
ECC protection | Yes |
System interface | 2* PCI Express 4.0×16 |
MLU-LINK external interface | 8Ports |
MLU-LINK interface bandwidth | 8*100 GB /S |
TDP power consumption | 2300W |
Heat dissipation scheme | Air-cooled, compatible with liquid-cooled |
3.2 Overview of structural specifications
The structure specifications of the MLU-X1001 accelerator are as follows:
Table 3.2 Structural Specification for MLU-X1001
Specification indicators | Note |
---|---|
Shape | 437mm87mm735mm |
Weight | 29Kg |
Package Shape | 1000mm635mm230mm |
Package Weight | 39Kg |
Bending radius of cable:
Table 3.3 Specification for cable bending
Wire diameter| Bending radius L1
(Base on the cabinet column)| Bending radius L2
(Base on the chassis)
---|---|---
30 AWG| 97.45 mm| 78.5 mm
26 AWG| 121.64 mm| 102.7 mm
3.3 Overview of electrical specifications
MLU-X1001 accelerator electrical specifications as follows:
Table 3.4 Electrical Specification for MLU-X1001
Specification indicators | Note |
---|---|
System interface | PCIE Gen 4X 16 |
Number of PCIE ports | 2Ports |
PCIE bandwidth | 128GB /s |
Number of MLU-LINK ports | 8Ports |
MLU-LINK bandwidth | 800GB /s |
BMC management interface | IPMI V2.0 |
Host management interface | SMBUS |
Input voltage | AC 115-127V,14,2A, 60/50Hz |
AC 200-240V,14.9A, 60/50Hz
DC 240V, 16A(China mainland only)
3.4 Summary of heat dissipation specifications
The heat dissipation specification of MLU-X1001 accelerator is as follows:
Table 3.4 Heat dissipation specifications of MLU-X1001
Specification indicators | Note |
---|---|
Working temperature | 0℃-35℃, altitude of 900m below |
Working humidity | 20%RH-85%RH |
Storage temperature | -40℃—75℃ |
Storage humidity | 5%RH-95%RH |
Noise | SDP @23℃, sound power ≤7.2 bels |
Working altitude | ≤3000 m (900-3000m, for each increase of 300 m supported |
working temperature drop 1℃)
Component Profile
4.1 MLUX-BB 1
MLUX-BB 1 is the baseboard which carries MLU290-M5 Intelligent processing
card. Each MLUX-BB 1 can carry 4 MLU290-M5 intelligent accelerating cards. The
details are shown in the following figure:
Table 4.1 MLUX -BB1 Description
Serial number | Note |
---|---|
1 | MLU-LINK-0A &0B |
2 | MLU-LINK-2A &2B |
3 | MLU-LINK-1A &1B |
4 | MLU-LINK-1A &1B |
5 | PCIE 0 |
6 | PCIE 1 |
7 | IPMI |
8 | UID |
9 | COM HUB0 |
10 | COM HUB1 |
11 | COM HUB2 |
12 | AC INDICATOR |
13 | FRONT PANEL CONN. |
14 | PDB MGT.CONN. |
15 | OAM MODULE 0 |
16 | OAM MODULE 2 |
17 | OAM MODULE 1 |
18 | OAM MODULE 3 |
19 | F AN 4 |
20 | F AN 3 |
21 | F AN 2 |
22 | F AN 1 |
23 | F AN 0 |
24 | PCIE SWITCH 0 |
25 | PCIE SWITCH 1 |
26 | 54V POWER BUSBAR |
27 | HANDLE 0 |
28 | HANDLE 1 |
29 | FRONT PCIE CONN. |
4.2 MLUX -PA4
MLUX -PA4 is a PCIE board, which is placed on the host server and provides
Mini SAS HD interface for connection with MLU-X1001.The details are shown in
the following figure:Table 4.2 MLUX -PA4
Description
Serial number | Note |
---|---|
1 | mini SAS HD CONN. |
2 | PCIE RETIMER |
3 | mini SAS HD CONN. |
4.3 MLUX -PDB
MLUX -PDB is the power distribution board.The details are shown in the
following figure:Table 4.3 MLUX -PDB
Description
Serial number | Note |
---|---|
1 | 54V POWER BUSBAR |
2 | PSU CONN.0 |
3 | PSU CONN.1 |
4 | INTRUTION |
5 | SSD POWER CONN.0 |
6 | SSD POWER CONN.1 |
7 | PDB MGT.CONN. |
8 | 12V POWER BUSBAR |
4.4 MLUX -LINKB
MLUX -LINKB is passive connection board.The details are shown in the following
figure: Table 4.4 MLUX -PDB
Description
Serial number | Note |
---|---|
1 | SSD MGT.CONN.0 |
2 | SSD MGT.CONN.1 |
3 | OCULINK 0 |
4 | OCULINK 1 |
5 | OCULINK 2 |
6 | OCULINK 3 |
7 | IBB CONN. |
8 | FRONT PCIE CONN. |
4.5 MLUX -IBB
MLUX-IBB is the backplane of Infiniband card. Each MLUX-IBB can place two
Infiniband cards.The details are shown in the following figure:Table 4.5 MLUX-IBB Description
Serial number | Note |
---|---|
1 | IBB CONN. |
2 | IB SLOT 0 |
3 | IB SLOT 1 |
4.6 Front panel
The front panel of the chassis is shown as follows: Table 4.6 Description of front panel of chassis
Serial number | Note |
---|---|
1 | Switching keys |
2 | UID keys |
3 | Reset button |
4 | PSU 0 |
5 | PSU 1 |
6 | SSD 0 |
7 | SSD 1 |
8 | SSD 2 |
9 | SSD 3 |
10 | NIC 0 |
11 | NIC 1 |
4.7 Back panel
The rear panel of the chassis is shown as follows: Table 4.7 Description of rear panel of chassis
Serial number | Note |
---|---|
1 | PCIE 0 |
2 | PCIE 1 |
3 | MLU-LINK-0A |
4 | MLU-LINK-0B |
5 | MLU-LINK-2A |
6 | MLU-LINK-2B |
7 | MLU-LINK-1A |
8 | MLU-LINK-1B |
9 | MLU-LINK-3A |
10 | MLU-LINK-3B |
11 | IPMI |
12 | UID |
13 | COM HUB 0 |
14 | COM HUB 1 |
15 | COM HUB 2 |
16 | AC INDICATOR |
17 | POWER CORD 0 |
18 | POWER CORD 1 |
Electrical specifications
5.1 PCIE topology description
MLU-X1001 accelerator uses 2 miniSAS HD interfaces to connect with the host
server, and there are 2 PCIE switching chips to connect the PCIE devices
inside.PCIE interconnection topology is shown as follows: PCIE signal rate is 16 Gbps, and the cable loss is
controlled within 15 dB @8GHz. It is recommended to use 1 meter cable with a
diameter of 30 AWG.
The pins of the miniSAS HD connectors used by PCIE interfaces are defined as
follows:
Table 5.1 PCIE Interface pin definition
miniSAS HD pin | Note | Pin internal processing |
---|---|---|
RX [15:0]P/N | PCIE input signal | External AC coupling capacitance |
TX [15:0]P/N | PCIE output signal | External AC coupling capacitance |
SMCLK | SMBUS interface clock signal | 4.7 KΩ pull-up to 3.3 V |
SMDAT | SMBUS interface data signal | 4.7 KΩ pull-up to 3.3 V |
PERST# | Reset signal | |
REFCL K P/N | PCIE clock signal | |
PRESENT | Opposite side in position detection signal | 4.7 KΩ pull-up to 3.3 V |
5.2 MLU-LINK interface description
MLU-X1001 accelerator is equipped with 4 MLU290-M5 intelligent accelerating
cards, each card has 6 MLU-LINK ports. Among them, 4 ports are used for
internal interconnection and 2 ports are used for external interconnection.
The MLU-LINK interconnection topology between the internal cards is as
follows: MLU-LINK interconnection
between extenders refer to the following figure:
The signal rate of MLU-LINK is 50 Gbps, and the
cable loss is controlled within 10 dB @12.5GHz. It is recommended to use 1
meter cable with a diameter of 30 AWG or 2 meter cable with a diameter of 28
AWG .
MLU-LINK interface uses QSFP DD connectors whose pins are defined as follows:
Table 5.2 MLU-LINK Interface pin definition
QSFP-DD pins | Note | Internal processing of pins |
---|---|---|
RX [8:1]P/N | SERDES signal input with AC coupling capacitance inside |
External AC coupling capacitance is not required
TX [8:1]P/N| SERDES signal output with AC coupling capacitance inside|
External AC coupling capacitance is not required
SCL| I2C interface clock signal of optical module| 4.7 KΩ pull-up to 3.3 V
SDA| I2C interface data signal of optical module| 4.7 KΩ pull-up to 3.3 V
ModPrsL| Optical module in position signal output| 4.7 KΩ pull-up to 3.3 V
ModSelL| Selection signal of optical module, default pull- up inside| 1KΩ
pull-down to GND
ResetL| Reset signal, low level effective| 4.7 KΩ pull-up to 3.3 V
IntL| Interrupt signal of optical module, OC gate, low level indicates an
interrupt signal| 4.7 KΩ pull-up to 3.3 V
InitMode| Initialization mode| 1KΩ pull-down to GND
VccRx,VccRx1,Vcc1,Vcc2 VccTx ,VccTx1| Power signal|
5.3 Power Interface Description
MLU-X1001 accelerator Input Power Requirements:
Table 5.3 MLU-X1001 Input Power Supply Specifications
Input voltage | Max. Input Current |
---|---|
AC 115-127V,60/50Hz | 14.2A |
AC 200-240V,60/50Hz | 14.9A |
DC 240V (China mainland only) | 16A |
MLU-X1001 accelerator is able to reduce power consumption adjustment for
instantaneous power changes above the µs level. The power regulator can
support power fluctuations within the ms level (e.g .1.2 x TDP).
Table 5.4 EDPp specifications of MLU-X1001
EDP | Duration |
---|---|
TBD | TBD |
BMC management system
The BMC management system of MLU-X1001 is compatible with server management
standards
IPMI 2.0, with high reliability of hardware monitoring and management
functions.
6.1 BMC function description
MLU-X1001 accelerator BMC management system main functions and features as
follows:
Table 6.1 BMC Functional description
Function | Note |
---|---|
Remote control | Management through SOL functions |
Information management | Management of equipment model, asset information and |
version information
Status monitoring| Real-time monitoring of power supply, temperature, working
status and other operating states information
Heat dissipation control| Modulate fan speed according to environment
temperature, equipment working load and abnormal conditions
Alarm management| Report the alarm information in real time and deal with it
accordingly
WEB interface management| Provides visual WEB interface for query and
management
IPMITool tool management| Support IPMITool
Note: Use button battery (Panasonic: CR2032) to power the RTC clock.If the battery is not replaced correctly, there is a risk of explosion.
Heat dissipation specifications
7.1 Description of the heat dissipation environment
The working environment of MLU-X1001 is as follows:
Table 7.1 Working environment of MLU-X1001
Items | Specification parameters |
---|---|
Working environment temperature | 0~35℃ |
Relative humidity | 20%~85% no condensation |
Noise | 62~88 dBA |
Note: There will be 62~88dbA noise during normal operation. Please take
adequate sound insulation measures in advance.
MLU-X1001 air volume description:
- MLU-X1001 can provide up to 360 CFM of air volume
- Do not block the front and rear ventilation areas of the chassis during operation of MLU-X1001
- When installing MLU-X1001, please reduce the air resistance around the inlet and outlet of the chassis
- Please follow the instructions to arrange the cable to minimize the air resistance of the air flue
- Please install the chassis cover before using MLU-X1001. If CXM1000 is used without the chassis cover, the components may be damaged.
- If you need to replace the fan, please make sure to complete within 25s to avoid overheating of the system.
7.2 Wind resistance curve of MLU-X1001
The system wind resistance curve of MLU-X1001 is shown below: Table 7.2 Air Volume VS Pressure Drop of MLU-X1001
Air volume (CFM) | Air pressure (Pa) |
---|---|
400 | 1737 |
360 | 1408 |
310 | 1044 |
260 | 735 |
0 | 0 |
Optional components
8.1 PCIE High Speed Cable
MLU-X1001 uses miniSAS HD high-speed cable for PCIE Gen4
interconnection.Compatible cable models are as follows:
Table 8.1 MLU-X1001 PCIE Compatible Cable
Manufacturers | Model | Specifications |
---|---|---|
Molex | 2040431030 | 1 m ,30 AWG |
8.2 MLU-LINK High Speed Cable
MLU-X1001 uses QSFP-DD high-speed cable for MLU-LINK
interconnection.Compatible cable models are as follows:
Table 8.2 MLU-X1001 MLU-LINK Compatible Cable
Manufacturers | Model | Specifications |
---|---|---|
Molex | 2015911012 | 1 m ,30 AWG |
Molex | 2015913020 | 2 m ,28 AWG |
TE | 2366016-4 | 1 m, 30 AWG |
TE | 2366101-3 | 2 m, 28 AWG |
8.3 Network
MLU-X1001 can use InfiniBand network card or ROCE network card for cluster
interconnection.
Compatible network card models are as follows:
Table 8.3 Network Card Compatibility
Manufacturers | Model | Specifications |
---|---|---|
Mellanox | MCX653105A-HDAT | Half high and half long single PCIE 4.0 |
8.4 Hard disk
Compatible NVMe hard disk models for MLU-X1001 are as follows:
Table 8.4 NVMe Hard Disk Compatibility
Manufacturers | Model | Specifications |
---|---|---|
HGST | HUSMR7619BHP301 | NVMe 1.92Tb |
Cambricon NeuWare development environment
NeuWare fully supports various mainstream programming frameworks (e.g.
TensorFlow 、Caffe 、PyTorch and MXNet). For the above programming framework,
users can easily develop and deploy deep learning applications on Cambrian
MLU290-M5. At the same time, the NeuWare provides a complete runtime system
and driver software to facilitate the rapid integration of the system.
NeuWare also provides a range of tools including application development,
function debugging, performance tuning, etc. Among them, application
development tools include machine learning library, runtime library, compiler,
model retraining tool and specific field (such as video analysis field) SDK;
function debugging tools can meet different levels of debugging requirements
such as programming framework and function library; performance tuning tools
include performance profiling tools and system monitoring tools.
Compliance
MLU-X1001 accelerator is compliant with the regulations listed in this
chapter. The compliance marks can be found on the labels of each devices.
FCC statement
This device complies with Part 15 of the FCC Rules.
Operation is subject to the following two conditions: (1) This device may not
cause harmful interference, and (2) this device must accept any interference
received, including interference that may cause undesired operation.
This equipment has been tested and found to comply with the limits for a Class
A digital device, pursuant to part 15 of the FCC Rules. These limits are
designed to provide reasonable protection against harmful interference when
the equipment is operated in a commercial environment. This equipment
generates, uses, and can radiate radio frequency energy and, if not installed
and used in accordance with the instruction manual, may cause harmful
interference to radio communications. Operation of this equipment in a
residential area is likely to cause harmful interference in which case the
user will be required to correct the interference at his own expense.
Caution: Any changes or modifications not expressly approved by the party
responsible for compliance could void the user’s authority to operate this
equipment.
Underwriters Laboratories (UL)
UL Listed Product Logo for MLU-X1001 Accelerator,model name MLU-X1001.
Copyright © 2020 Cambricon Corporation
Documents / Resources
|
Cambricon MLU-X1001 Accelerator Construction Unit of Artificial Intelligence
Supercomputing
[pdf] User Manual
MLU-X1001, MLUX1001, 2ARVF-MLU-X1001, 2ARVFMLUX1001, MLU-X1001, Accelerator
Construction Unit of Artificial Intelligence Supercomputing
---|---
Read User Manual Online (PDF format)
Read User Manual Online (PDF format) >>