Lenovo ThinkSystem SD665-N V3 Neptune DWC Server User Manual
- June 12, 2024
- Lenovo
Table of Contents
Lenovo ThinkSystem SD665-N V3 Neptune DWC Server
Product Information
- The Lenovo ThinkSystem SD665-N V3 is a high-performance server based on the fifth generation Lenovo NeptuneTM direct water cooling platform. It features two 4th Generation AMD EPYCTM Processors with NVIDIA HGXTM H100 4-GPU acceleration and NVIDIA NDR InfiniBand networking. The server is designed to accelerate applications for HPC, AI training, and inference workloads.
- The SD665-N V3 is equipped with four NVIDIA H100 Tensor Core GPUs that are interconnected through NVLink, delivering significant performance improvements. It supports the Lenovo HPC philosophy and can scale efficiently to thousands of GPUs or be partitioned into seven GPU instances using NVIDIA Multi-Instance GPU (MIG) technology. The server is suitable for various workloads, including chemistry, finite elements, fluid dynamics, molecular dynamics, weather, and climate simulations.
- Lenovo NeptuneTM is the leading water cooling technology used in the SD665-N V3. With a focus on low-pressure drop and high-quality materials, Lenovo ensures best-in-class reliability. The server leverages copper and brazed connections to guarantee leak-free operations, even at high pressure. The water cooling process starts at manufacturing and includes Helium and Nitrogen pressure tests to ensure consistent quality without the need for hazardous antifreeze-components.
- The Lenovo ThinkSystem SD665-N V3 is delivered as a fully integrated Lenovo Scalable Infrastructure (LeSI) solution. LeSI provides interoperability testing and pre-integration of hardware, software, and firmware components. The server comes pre-cabled and pre-loaded with the best recipe and optionally an OS image. It undergoes rack-level testing in manufacturing to ensure reliable delivery and minimize installation time in customer data centers.
- The server is enabled with the Lenovo HPC & AI Software Stack, which supports multiple users and scaling within a single cluster environment. It provides a fully tested and supported open-source software stack for efficient consumption of Lenovo Supercomputing capabilities. The Confluent management system and Lenovo Intelligent Computing Orchestration (LiCO) Web portal abstract the complexity of HPC cluster orchestration and AI workload management, making open-source HPC software accessible to every customer. The LiCO Web portal offers workflows for both AI and HPC, supporting multiple AI frameworks and enabling diverse workload requirements on a single cluster.
Product Usage Instructions
- Installation: Follow the installation guide provided with the server to set up the Lenovo ThinkSystem SD665-N V3 in your data center. Ensure that all cables are properly connected and the server is securely installed.
- Power On: Connect the power supply to the server and turn on the power switch. Wait for the server to boot up and initialize the operating system.
- Configuration: Use the Lenovo Intelligent Computing Orchestration (LiCO) Web portal to configure the server according to your workload requirements. Follow the provided workflows for AI or HPC tasks and select the appropriate settings and parameters.
- Application Acceleration: Take advantage of the NVIDIA HGXTM H100 4-GPU acceleration and NVIDIA NDR InfiniBand networking capabilities to accelerate your applications. Utilize the NVIDIA H100 Tensor Core GPUs interconnected through NVLink for improved performance in HPC, AI training, and inference workloads.
- Water Cooling Maintenance: The Lenovo ThinkSystem SD665-N V3 utilizes water cooling technology. Ensure that the water cooling system is properly maintained to guarantee optimal performance and reliability. Refer to the user manual for instructions on maintaining the water cooling system.
- Software Stack Management: Utilize the Lenovo HPC & AI Software Stack to manage and optimize your software environment. Leverage the Confluent management system and the LiCO Web portal to simplify HPC cluster orchestration and AI workload management.
- Scaling: If required, scale the server by adding additional Lenovo ThinkSystem SD665-N V3 units. Follow the provided guidelines and best practices for scaling your infrastructure.
- Troubleshooting: In case of any issues or errors, refer to the troubleshooting section of the user manual or contact Lenovo support for assistance.
Lenovo ThinkSystem SD665-N V3
Exascale technology made available at every scale
Lenovo Neptune accelerated
- Lenovo ThinkSystem SD665-N V3 is based on our fifth generation Lenovo Neptune™ direct water cooling platform and on two 4th Generation AMD EPYC™ Processors with NVIDIA HGX™ H100 4-GPU acceleration and NVIDIA NDR InfiniBand networking.
- The combination of market leading NVIDIA acceleration technology with the market leading water cooling solution from Lenovo results in extreme performance in an extreme dense packaging. A single rack of Lenovo ThinkSystem SD665-N V3 nodes is more than doubles the performance of the previous generation with up to 5.8 PetaFLOPS High Performance Computing (HPC) or almost 200 PetaFLOPS Artificial Intelligence (AI) peak performance, occupying only 0.72m² (less than 8 ft²) of data center floor space.
Accelerating your applications
On the SD665-N V3, four NVIDIA H100 Tensor Core GPU are interconnected through
NVLink, delivering substantial performance improvements for HPC, AI training,
and inference workloads. The H100 supports the Lenovo HPC philosophy to enable
customers From Exascale to Everyscale™. Together with NVIDIA InfiniBand
networking, it scales efficiently to thousands of GPUs or, with NVIDIA Multi-
Instance GPU (MIG) technology, can be partitioned into seven GPU instances to
accelerate smaller workloads.
With NVIDIA® CUDA® the most widely used parallel computing platform and
programming model for GPUs is available free of charge to help you accelerate
the more than 700 supported HPC applications and every major deep learning
framework, for example:
- Chemistry like Gaussian and GROMACS
- Finite Elements like LS-DYNA and Simulia Abaqus
- Fluid Dynamics like OpenFOAM and ANSYS Fluent
- Molecular Dynamics like NAMD and AMBER
- Weather and Climate like WRF and ICON
The Lenovo ThinkSystem SD665-N V3 also supports NVIDIA® NGC™ providing pre- trained models, training scripts, optimized framework containers and inference engines for popular deep learning models.
Lenovo Neptune™: Leading water cooling technology
- A decade of experience in direct water cooling sets Lenovo apart. With a meticulous focus on low-pressure drop and highest quality materials, Lenovo achieves the best-in-class reliability.
- The SD665-N V3 leverages copper and brazed connections guaranteeing leak-free operations at extreme scale, even at high pressure.
- Another important differentiation is superior water loop design enabling up to 45 °C inlet temperatures for the highest energy reuse efficiency. The new water loop design optimizes performance with increased frequency while ensuring temperature uniformity, preventing
- Thermal Jitter for consistent application performance.
- Water cooling is an end-to-end process that starts at manufacturing. Through Helium and Nitrogen pressure tests from the node to the completed rack, the SD665-N V3 provides consistent quality at the highest standards. This approach also allows Lenovo to ship the systems pressurized without needing to send hazardous antifreeze-components to our customers.
Solutions That Scale
- Lenovo ThinkSystem SD665-N V3 is delivered as fullyintegrated Lenovo Scalable Infrastructure (LeSI) solution.
- LeSI provides Best Recipe guides to warrant interoperability of hardware, software and firmware among a variety of Lenovo and third-party components.
- In addition to interoperability testing, LeSI hardware is pre-integrated, pre-cabled, and pre-loaded with the best recipe and, optionally, an OS image. It is then tested at the rack level in manufacturing, to ensure a reliable delivery and minimize installation time in the customer data center.
- Lenovo ThinkSystem SD665-N V3 is enabled with Lenovo HPC & AI Software Stack, where you can support multiple users and scale within a single cluster environment.
- Lenovo HPC & AI Software Stack provides our HPC customers with a fully tested and supported open-source software stack to enable their administrators and users for the most effective and environmentally sustainable consumption of Lenovo Supercomputing capabilities.
- Our Confluent management system and Lenovo Intelligent Computing Orchestration (LiCO) Web portal provide an interface designed to abstract the users from the complexity of HPC cluster orchestration and AI workloads management, making open-source HPC software consumable for every customer.
- The LiCO Web portal provides workflows for both AI and HPC, and supports multiple AI frameworks, allowing you to leverage a single cluster for diverse workload requirements.
Data Center Reliability Leader
- At Lenovo, we take a customer-centric approach, which is why ThinkSystem servers consistently rank #1 in reliability.
- Also, Lenovo is the leading provider of Supercomputers in the TOP500. The ThinkSystem SD665-N V3 provides the latest in performance and reliability in a scalable solution for enterprise and research.
Specifications
Form Factor | Full-wide 1U tray; 1 node+GPUs per tray |
---|---|
Chassis | DW612S Enclosure (6U) |
Processor | 1x or 2x 4th Generation AMD EPYC™ Processors per node |
Memory | Up to 3.0TB using 24x 128GB 4800 MHz TruDDR5 RDIMM slots per tray |
I/O Expansion | NVIDIA ConnectX-7 4-chip VPI PCIe Gen5 Mezz Board for GPUdirect |
I/O
Acceleration| NVIDIA HGX™ H100 4-GPU with 4x NVLink connected SXM5 GPUs
Storage| Up to 2x 2.5″ NVMe SSDs (7mm height) or 1x 2.5″ NVMe SSDs (15mm
height) per node
Up to 1x liquid cooled M.2 NVMe SSD for both operating system boot and storage functions
RAID
Support
| OS Software RAID
Network Interfaces| Two onboard Ethernet interfaces: 2x 25GbE SFP28 LOM (1Gb,
10Gb or 25Gb capable; supports NC-SI) and 1x 1GbE RJ45 (supports NC-SI)
Power Management| Rack-level power capping and management via open-source
management software Confluent and application-level energy optimization
through Energy Aware Runtime (EAR)
Systems Management| Systems management using Lenovo HPC&AI Software stack with
Lenovo Intelligent Computing Orchestration (LiCO) portal and XClarity
Controller (XCC). Supports TPM 2.0 for advanced cryptographic functionality.
SMM management module in the enclosure, supports daisy chaining to reduce
cabling requirements
Front access| All adapters and drives are accessible from the front of the
server. Front ports include KVM breakout connector and External Diagnostics
Handset port for local management.
Rear access| 2x RJ45 on the SMM management module in the enclosure for XCC
with daisy chain support; USB 2.0 for SMM FFDC log collection
Power Supply| Up to 9x hot-swap air-cooled power supplies (2400W Platinum,
2600W Titanium), or Up to 3x hot-swap direct-water-cooled power supplies
(7200W Titanium)
Supports up to N+1 redundancy
Cooling Design| Direct Water Cooling at the heat source with up to 45°C inlet
water temperature
OS Support| Red Hat, SUSE, Rocky Linux (with LeSI support);
Visitlenovopress.com/osig for more information.
Limited Warranty| 3-year customer replaceable unit and onsite limited
warranty, next business day 9×5, service upgrades available
About Lenovo
Lenovo (HKSE: 992) (ADR: LNVGY) is a US$70 billion revenue global technology powerhouse, ranked #159 in the Fortune Global 500, employing 75,000 people around the world, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver smarter technology for all, Lenovo is expanding into new growth areas of infrastructure, mobile, solutions and services. This transformation is building a more inclusive, trustworthy, and sustainable digital society for everyone, everywhere
For More Information
To learn more about the Lenovo ThinkSystem SD665-N V3, contact your Lenovo representative or Business Partner or visit lenovo.com/thinksystem For detailed specifications, consult the SD665-N V3 product guide
NEED STORAGE? Learn more about Lenovo Storage
lenovo.com/systems/storage
NEED SERVICES? Learn more about Lenovo Services
lenovo.com/systems/services
© 2022 Lenovo. All rights reserved.
Availability: Offers, prices, specifications and availability may change
without notice. Lenovo is not responsible for photographic or typographic
errors. Warranty: For a copy of applicable warranties, write to: Lenovo
Warranty Information, 1009 Think Place, Morrisville, NC, 27560. Lenovo makes
no representation or warranty regarding third-party products or services.
Trademarks: Lenovo, the Lenovo logo, From Exascale to Everyscale, Lenovo
Neptune®, ThinkSystem®, and XClarity® are trademarks or registered trademarks
of Lenovo. Linux® is the trademark of Linus Torvalds in the U.S. and other
countries. Dynamics is a trademark of Microsoft Corporation in the United
States, other countries, or both. Other company, product, or service names may
be trademarks or service marks of others. Document number DS0153, published
November 10, 2022. For the latest version, go to
lenovopress.lenovo.com/ds0153.
References
- Infrastructure Services | IT Infrastructure Solutions | Lenovo US
- Enterprise Storage Server Solutions for Your Data Center | Lenovo US
- Servers | Rack, Tower, Edge & Data Center Servers | Lenovo US
- Lenovo ThinkSystem SD665-N V3 Datasheet > Lenovo Press
- Enterprise Storage Server Solutions for Your Data Center | Lenovo US
- Lenovo ThinkSystem SD665-N V3 Neptune DWC Server Product Guide > Lenovo Press
- Lenovo ThinkSystem SD665-N V3 Datasheet > Lenovo Press
Read User Manual Online (PDF format)
Read User Manual Online (PDF format) >>