Lenovo SR650 V3 Server Product Owner’s Manual

: September 8, 2024
: Lenovo

Table of Contents

Lenovo SR650 V3 Server Product
Product Usage Instructions
VMware Private AI with Intel
Llama2 LLM Inference Performance with 4th Gen Intel Xeon Scalable
of instances| 2
Llama2 7B Performance Results with/without Intel AMX
Llama2 13B Performance Results with/without Intel AMX
Bill of Materials for ThinkAgile VX650 V3
References
Read User Manual Online (PDF format)
Download This Manual (PDF format)

Lenovo SR650 V3 Server Product

Product Specifications

Brand: Lenovo
Model: ThinkAgile VX V3 and ThinkSystem V3
Processor: Intel 4th and 5th Gen Scalable Processors
Server Models: SR650 V3 2U, SR630 V3 1U, VX650 V3 2U, VX630 V3 2U
Cores per Socket: Up to 64 cores with 5th Gen Intel Xeon processors, up to 60 cores with 4th Gen processors

Product Usage Instructions

Intel Optimized AI Libraries & Frameworks
The product comes with integrated Intel AMX on 4th and 5th gen Intel Xeon Scalable processors. This allows for efficient operation of AI inferencing and fine-tuning workloads, including Generative AI use cases.

Intel AI Software and Optimization Libraries
The product supports various Intel AI optimization software and development tools to enhance performance. Developers can utilize tools like MLOPs, Cnvrg.io, SigOpt, Intel Extension for PyTorch, and Intel Distribution for Python.

VMware Cloud Foundation
VMware Cloud Foundation is a multi-cloud platform supporting virtual machines and containerization of workloads. It includes VMware Aria Suite for private/hybrid cloud management and VMware Tanzu for Kubernetes workloads.

VMware Private AI with Intel
VMware Private AI with Intel enables Generative AI at scale. It requires vSphere 8.0 and above with VMs using virtual HW version 20 and above. The guest OS running Linux should use kernel 5.16 or later.

Frequently Asked Questions (FAQ)

Q: What are the key features of the Lenovo ThinkAgile VX V3 and ThinkSystem V3?
A: The key features include support for Intel 4th and 5th Gen Scalable Processors, optimized for AI workloads, up to 64 cores per socket, and integration with Intel AMX for enhanced AI operations.
Q: Which software tools are provided for AI optimization?
A: The product offers various tools such as MLOPs, Cnvrg.io, SigOpt, Intel Extension for PyTorch, and Intel Distribution for Python to optimize AI performance.
Q: What are the system requirements for VMware Private AI with Intel?
A: VMware Private AI with Intel requires vSphere 8.0 and above, VMs with virtual HW version 20 and above, Linux kernel 5.16 or later for the guest OS, and Linux kernel 5.16 or later for Tanzu Kubernetes worker nodes.

VMware Private AI with Intel on Lenovo ThinkAgile VX
V3 and ThinkSystem V3
Solution Brief
Lenovo V3 Systems with Intel 4th and 5th Gen Scalable Processors
Lenovo ThinkSystem SR650 V3 2U and SR630 V3 1U servers and ThinkAgile VX650 V3 2U and VX630 V3 2U hyperconverged solutions with VMware vSAN powered by 4 and 5th gen Intel Xeon Scalable processors are optimized for AI workloads and Accelerated by Intel offerings. Lenovo V3 systems support up to 64 cores per socket with 5 Gen Intel Xeon processors and up to 60 cores per socket with 4 th Gen processors.

ThinkAgile VX V3 systems are factory-integrated, pre-configured ready-to-go integrated systems built on proven and reliable Lenovo ThinkSystem servers that provide compute power for a variety of workloads and applications and are powered by industry-leading hyperconverged infrastructure software from VMware. It provides quick and convenient path to implement a hyperconverged solution powered by VMware Cloud Foundation (VCF) or VMware vSphere Foundation (VVF) software stacks with “one-stop shop” and a single point of contact provided by Lenovo for purchasing, deploying, and supporting the solution.

Intel Optimized AI Libraries & Frameworks
Intel provides a comprehensive portfolio of AI development software including data preparation, model development, training, inference, deployment, and scaling. Using optimized AI software and developer tools can significantly improve AI workload performance, and developer productivity, and reduce compute resource usage costs. Intel® oneAPI libraries enable the AI ecosystem with optimized software, libraries, and frameworks. Software optimizations include leveraging accelerators, parallelizing operations, and maximizing core usage.

Intel® Advanced Matrix Extensions (Intel® AMX)
Intel® AMX is a new set of instructions designed to work on matrices and it enables AI fine-tuning and inference workloads to run on the CPU. Its architecture supports bfloat16 (training/inference) and int8 (inference) data types and Intel provides tools and guides to implement and deploy Intel AMX. The Intel AMX architecture is designed with two components,

Tiles: These consist of eight two-dimensional registers, each 1 kilobyte in size, that store large chunks of data.
Tile Matrix Multiplication (TMUL): TMUL is an accelerator engine attached to the tiles that performs matrix-multiply computations for AI.

Refer more information about Intel AMX here.
With integrated Intel AMX on 4 and 5th gen Intel Xeon Scalable processors, many AI inferencing and fine- tuning workloads, including many Generative AI use cases, can run optimally.

Click here to check for updates
Lenovo, Intel, and VMware solutions together enable Generative AI at scale
Intel AI software and optimization libraries provide scalable performance using Intel CPUs and GPUs. Many of the libraries and framework extensions are designed to leverage the CPU to provide optimal performance for machine learning and inference workloads. Developers looking to leverage these tools can download the AI Tools from AI Tools Selector.

Table 1: Intel AI optimization software and development tools

Software/Solution	Details
Intel® oneAPI Library

Intel® oneAPI Deep Neural Network Library (oneDNN)
Intel® oneAPI Data Analytics Library (oneDAL)
Intel® oneAPI Math Kernel Library (oneMKL)
Intel® oneAPI Collective Communications Library (oneCCL)

MLOPs| Cnvrg.io is a platform to build and deploy AI models at scale
AI Experimentation| SigOpt is a guided platform to design experiments, explore parameter space, and optimize hyperparameters and metrics
Intel® Extension for PyTorch| Intel Extension for PyTorch extends PyTorch with the latest performance optimizations for Intel hardware, also taking advantage of Intel AMX
Intel Distribution for Python|

Optimized core python libraries (scikit-learn, Pandas, XGBoost)
Data Parallel Extensions for Python.
Extensions for TensorFlow, PyTorch, PaddlePaddle, DGL, Apache Spark, and for machine learning
NumPy, SciPy, Numba, and numba-dpex.

Intel® Neural Compressor| This open-source library provides a framework- independent API to perform model compression techniques such as quantization, pruning, and knowledge distillation, to reduce model size and speed up inference.

VMware Cloud Foundation
VMware Cloud Foundation (VCF) is a multi-cloud platform supporting virtual machines and containerization of workloads on common virtualized infrastructure built on top of vSphere, vSAN, and NSX. The suite includes VMware Aria Suite for private/hybrid cloud management and VMware Tanzu for Kubernetes workloads. Refer more details in VMware Cloud Foundation reference design here.

VMware Private AI with Intel

VMware Private AI with Intel solution enables enterprises to develop and deploy classical machine learning models and generative AI applications on the infrastructure powered by Intel AI software and built-in accelerators and managed by VMware Cloud Foundation. VCF provides integrated security capabilities to secure AI, and it is an ideal platform for training and running private LLMs across business functions in an enterprise. The Intel AI software suite and VMware Cloud Foundation are validated on Lenovo ThinkSystem and ThinkAgile servers with 4th and 5th gen Intel Xeon Scalable processors and private LLMs or generative AI models can be deployed at scale along with other AI use cases.

Intel AMX instructions are supported on vSphere 8.0 and above with VMs using virtual HW version 20 and above. The guest OS running Linux should use kernel 5.16 or later and if Tanzu Kubernetes is used, the worker nodes should use Linux kernel 5.16 or later.

Figure 1. VMware Private AI with Intel on Lenovo ThinkAgile VX and ThinkSystem Servers

Llama2 LLM Inference Performance with 4th Gen Intel Xeon Scalable

Processors

The Generative AI inference testing with Llama2 7B and 13B model was done on ThinkAgile VX650 V3 server with 4 Gen Intel Xeon Scalable processors by Intel on May 14, 2024. The test was carried out with different input token sizes 32/256/1024/2048 with varying batch sizes of 1-16 to simulate concurrent requests with static output token size 256. The objective of the testing is to validate different scenario’s performance with acceptable latency of less than 100ms latency and to compare the results with/without Intel AMX. The test and inference serving is targeted on a single node with local storage running ESXi 8.0 U2 and two Ubuntu 22.04.4 guest virtual machines. The model performance can be scaled out by using multiple nodes, but it is not in scope of the current version.

Table 2. Test Hardware Configuration

Server	Lenovo ThinkAgile VX650 V3 CN
Processor	2x Intel Xeon Gold 6448H processors, 2x32C, 2.4 GHz
Memory	1024GB (16x64GB DDR5 4800 MT/s [4800 MT/s])
NIC	1x ThinkSystem Mellanox ConnectX-6 Lx 10/25GbE SFP28 2-Port PCIe Ethernet

Adapter
Disk| 1x ThinkSystem M.2 7450 PRO 960GB Read Intensive NVMe PCIe 4.0 x4 NHS SSD 8x ThinkSystem 2.5″ U.3 7450 MAX 6.4TB Mixed Use NVMe PCIe 4.0 x4 HS SSD
Hyperthreading| Intel® Hyper-Threading Technology Enabled
Turbo| Intel® Turbo Boost Technology Enabled
NUMA Nodes| 2
BIOS| 2.14
Microcode| 0x2b000461
Hypervisor| VMware ESXi 8.0 U2 22380479
BIOS Settings| Performance (BIOS and ESXi profile), Max C-State =C0/C1
Guest VM| Ubuntu 22.04.4 LTS, 5.15.0-105-generic
VM HW Version|

VM vHardware gen 21 – Intel AMX available for guest OS
VM vHardware gen 17 – Intel AMX is not available for guest OS; Intel® Advanced Vector Extensions 512 (Intel® AVX-512) VNNI is available

VM Configuration| 60vCPU (reservation) 400GB RAM (reservation) vmxnet3
Latency sensitivity mode:high multi socket scenario (30 cores per AI instance)

Table 3. Test Configuration

Workload	LLM Inference
Application	Intel Extension for PyTorch (IPEX) with DeepSpeed
Libraries	IPEX 2.2 with DeepSpeed 0.13; Pytorch 2.2 (public releases)
Script	[https://github.com/intel/intel-extension-for-

pytorch/tree/v2.2.0%2Bcpu/examples/cpu/inference/python/llm](https://github.com/intel /intel-extension-for- pytorch/tree/v2.2.0%2Bcpu/examples/cpu/inference/python/llm)
Test Run settings|

warm up steps = 5
steps = 50
-a flag (Max number of threads (this should align with
OMP_NUM_Threads)) = 60
e (Number of inter threads: e=1: run 1 thread per core; e=2: run two threads per physical core) = 1

of instances| 2

Llama2 7B Performance Results with/without Intel AMX

Figure 2 shows the 2nd token average latency performance with Intel AMX on 4th gen Intel Xeon Scalable processors for Llama 7B model and Figure 3 shows the results without Intel AMX. The test with Intel AMX shows up to 42% in 2nd token latency for the scenario with input/output token size 32/256. The 2nd token latency for different concurrent requests scenarios (batch sizes 1/2/4/8/16) with input/output token size of 32/256, and 256/256 are within an acceptable threshold of 100 milliseconds and it shows significant throughput increase can be achieved with Intel AMX. The results without Intel AMX show all the scenarios with batch size 8/16 exceeded the 100 milliseconds threshold

Llama2 13B Performance Results with/without Intel AMX

Figure 4 shows the 2 token average latency performance with Intel AMX on 4 th gen Intel Xeon Scalable processors for Llama 13B model and Figure 5 shows the results without Intel AMX. The test with Intel AMX shows up to 18% decrease in 2 token latency for the scenario with input/output token size 32/256. The 2nd token latency for different concurrent user scenarios (batch sizes 1/2/4/8) with input token size of 32/256 are within an acceptable threshold of 100 milliseconds and it shows considerable throughput increase can be achieved with Intel AMX. The results without Intel AMX shows most of the scenarios with batch size 4/8/16 exceeded the 100ms next token latency threshold.

Bill of Materials for ThinkAgile VX650 V3

Table 1. Bill of Materials

Part number	Product Description	Quantity
7D6WCTO1WW	Server: Lenovo ThinkAgile VX650 V3 Integrated System	1
BRY9	ThinkAgile VX V3 2U 24×2.5″ Chassis	1
B0W3	XClarity Pro	1
BZAK	Customer has VMware by Broadcom Software License	1
BN8K	ThinkAgile VX Remote Deployment	1
BPQD	Intel Xeon Gold 6448Y 32C 225W 2.1GHz Processor	2
BNF9	ThinkSystem 64GB TruDDR5 4800MHz (2Rx4) 10×4 RDIMM	16
5977	Select Storage devices – no configured RAID required	1
B8P1	ThinkSystem 440-16i SAS/SATA PCIe Gen4 12Gb Internal HBA	1
BT2G	vSAN ESA	1
BYRN	AF-2	1
BNEH	ThinkSystem 2.5″ U.2 P5620 3.2TB Mixed Use NVMe PCIe 4.0 x4 HS SSD	6
B8LU	ThinkSystem 2U 8×2.5″ SAS/SATA Backplane	1
BH8B	ThinkSystem 2U/4U 8×2.5″ AnyBay Backplane	1
B8P9	ThinkSystem M.2 NVMe 2-Bay RAID Adapter	1
BTTY	M.2 NVMe	1
BKSR	ThinkSystem M.2 7450 PRO 960GB Read Intensive NVMe PCIe 4.0 x4 NHS SSD

2
BLA3| SW stack for ThinkAgile VX Appliance| 1
BN2T| ThinkSystem Broadcom 57414 10/25GbE SFP28 2-Port OCP Ethernet Adapter| 2
BPK9| ThinkSystem 1800W 230V Titanium Hot-Swap Gen2 Power Supply| 2
6400| 2.8m, 13A/100-250V, C13 to C14 Jumper Cord| 2
BLL6| ThinkSystem 2U V3 Performance Fan Module| 6
BRPJ| XCC Platinum| 1
BTSL| ThinkAgile VX650 V3 IS| 1
BQQ6| ThinkSystem 2U V3 EIA right with FIO| 1
BM8T| ThinkSystem SR650 V3 Firmware and Root of Trust Security Module| 1
BP46| ThinkSystem 2U Main Air Duct| 1
BLL3| ThinkSystem SR650 V3 PSU Duct| 1
BSWK| ThinkAgile SR650 V3 Agency Label – No CCC| 1
BPDR| ThinkSystem SR850 V3/SR650 V3 Standard Heatsink w/ Heatpipes| 2
BMPF| ThinkSystem V3 2U Power Cable from MB to Front 2.5″ BP v2| 2
BS6Y| ThinkSystem 2U V3 M.2 Signal & Power Cable, SLx4 with 2X10/1X6 Sideband, 330/267/267mm| 1
BACB| ThinkSystem V3 2U SAS/SATA Y Cable from CFF C0,C1/ C2,C3 to Front 8×2.5″ BP| 2
BSYM| ThinkSystem SR650 V3,PCIe4 Cable,Swift8x-SL8x,2in1,PCIe 6/5(MB) to BP1/BP2| 1
BMP2| ThinkSystem V3 2U Power Cable from MB to CFF / Exp v2| 1
BRPV| ThinkSystem SR650 V3,PCIe Gen4 CBL,SLx8-Swift,CFF IN-PCIe4| 1
BPE3| ThinkSystem SR650 V3 MCIO8x to SL8x CBL, PCIe4, 8×2.5AnyBay, 200mm| 2
BE0E| N+N Redundancy With Over-Subscription| 1
---|---|---
BK15| High voltage (200V+)| 1
BQ11| G4 x16/x8/x8 PCIe Riser BLKL for Riser 1 Placement| 1
BLKL| ThinkSystem V3 2U x16/x8/x8 PCIe Gen4 Riser1 or 2| 1
5641PX3| XClarity Pro, Per Endpoint w/3 Yr SW S&S| 1
1340| Lenovo XClarity Pro, Per Managed Endpoint w/3 Yr SW S&S| 1
B8Q8| ThinkSystem 440-16i SAS/SATA PCIe Gen4 12Gb Internal HBA Placement| 1
5PS7B73066| Premier Advanced ThinkAgile IS – 3Yr 24×7 6Hr CSR + YDYD VX650 V3| 1
5AS7B15971| Hardware Installation (Business Hours) for VX650 V3| 1
5MS7A87711| ThinkAgile VX Remote Deployment (up to 4 node cluster)|

Accelerated by Intel

To deliver the best experience possible, Lenovo and Intel have optimized this solution to leverage Intel capabilities like processor accelerators not available in other systems. Accelerated by Intel means enhanced performance to help you achieve new innovations and insight that can give your company an edge.

For More Information
To learn more about this Lenovo solution contact your Lenovo Business Partner or visit: https://www.lenovo.com/us/en/servers-storage/solutions/database/

References:

Lenovo ThinkAgile VX650 V3 2U Integrated System and VX650 V3 2U Certified Node
ThinkAgile VX630 V3 1U Integrated System and Certified Node
Intel AI Development Software

Related product families
Product families related to this document are the following:

ThinkAgile VX Series for VMware
ThinkSystem SR630 V3 Server
ThinkSystem SR650 V3 Server
VMware Alliance

Notices
Lenovo may not offer the products, services, or features discussed in this document in all countries. Consult your local Lenovo representative for information on the products and services currently available in your area. Any reference to a Lenovo product, program, or service is not intended to state or imply that only that Lenovo product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any Lenovo intellectual property right may be used instead. However, it is the user’s responsibility to evaluate and verify the operation of any other product, program, or service. Lenovo may have patents or pending patent applications covering subject matter described in this document.

The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to:
Lenovo (United States), Inc.
8001 Development Drive
Morrisville, NC 27560
U.S.A.

Attention: Lenovo Director of Licensing
LENOVO PROVIDES THIS PUBLICATION ”AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. Lenovo may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

The products described in this document are not intended for use in implantation or other life support applications where malfunction may result in injury or death to persons. The information contained in this document does not affect or change Lenovo product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of Lenovo or third parties. All information contained in this document was obtained in specific environments and is presented as an illustration. The result obtained in other operating environments may vary. Lenovo may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Any references in this publication to non-Lenovo Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this Lenovo product, and use of those Web sites is at your own risk. Any performance data contained herein was determined in a controlled environment. Therefore, the result obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.

© Copyright Lenovo 2024. All rights reserved.
This document, LP1988, was created or updated on July 16, 2024.

Send us your comments in one of the following ways:

Use the online Contact us review form found at: https://lenovopress.lenovo.com/LP1988
Send your comments in an e-mail to: comments@lenovopress.com

This document is available online at https://lenovopress.lenovo.com/LP1988.

Trademarks
Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.

The following terms are trademarks of Lenovo in the United States, other countries, or both:

Lenovo®
AnyBay®
ThinkAgile®
ThinkSystem®
XClarity®

The following terms are trademarks of other companies:
Intel® and Xeon® are trademarks of Intel Corporation or its subsidiaries.
Linux® is the trademark of Linus Torvalds in the U.S. and other countries.
Other company, product, or service names may be trademarks or service marks of others.
Lenovo, Intel, and VMware solutions together enable Generative AI at scale