zpe Isolated Management Infrastructure Software User Guide
- June 8, 2024
- ZPE
Table of Contents
zpe Isolated Management Infrastructure Software
Specifications
- Isolated Management Infrastructure
- Out-of-band management
- Edge-native automation
- Zero trust principles
- Full-stack routing with resilience features
- Nodegrid resilience platform
- Gen 3 hardware
- Linux-based Nodegrid OS
Product Usage Instructions
Setting Up Isolated Management Infrastructure:
To set up the isolated management infrastructure using ZPE Systems’ Nodegrid:
- Ensure you have the required management devices per site as listed in the manual.
- Connect the WAN Router, Ethernet Switch, Jumpbox, Dual 4G/5G, Serial Switch, Compute, Firewall, and Storage devices to establish the infrastructure.
- Configure the Nodegrid resilience platform according to your network requirements and security policies.
Implementing Out-of-Band Management:
Follow these steps to implement out-of-band management for enhanced security and remote access:
- Connect the out-of-band management interfaces to management devices using zero-trust principles.
- Ensure the management interfaces are isolated from the internet and production gear.
- Configure full-stack routing with resilience features to prevent unauthorized access and attacks.
Utilizing Edge-Native Automation:
Utilize edge-native automation for configuration and change management:
- Implement automation tools like Salt or Ansible to automate configuration changes.
- Ensure systems can continue operating in case of outages or change errors by following best practices.
ZPE Systems’ Nodegrid Resilience Platform:
Make the most of ZPE Systems’ Nodegrid resilience platform:
- Deploy Gen 3 hardware with Linux-based Nodegrid OS for comprehensive IMI functionalities.
- Benefit from the drop-in resilience platform that combines all necessary functions for modern network resilience.
Frequently Asked Questions
Q: How many management devices are required per site for creating Isolated Management Infrastructure?
A: The architecture would require nine or more management devices per site, including WAN Router, Ethernet Switch, Jumpbox, Dual 4G/5G, Serial Switch, Compute, Firewall, Storage, and Sensor Compatibility devices.
Q: What are the key components of ZPE Systems’ Nodegrid resilience platform?
A: The key components include Gen 3 hardware and Linux-based Nodegrid OS that combine all functions required for Isolated Management Infrastructure.
Executive Summary
Business relies on network resilience. But network management now requires
more than a ‘break/fix’ approach. Decentralized and growing architectures are
spread across cloud, premises, and large IoT footprints. For shrinking IT
teams, this presents resilience problems, as many are unaware of the best
practice to help them care for their vast digital estate, maintain complex and
delicate architecture, and recover quickly in case of attack. This gap means
the network is a source of anxiety.
Big Tech solved these problems 10+ years ago, by doubling down on their
management architectures. This best practice, which is now recommended by
CISA, involves fully separating management networks from production networks
into what’s called Isolated Management Infrastructure. IMI goes far beyond
serial console and out-of-band access, providing the management and service
delivery capabilities teams need to reduce on-site upkeep, stabilize delicate
architecture, and accelerate ransomware recovery.
Big Tech’s network resilience secret lies in ZPE Systems’ Nodegrid hardware
and software. Nodegrid is the only network resilience platform, delivering the
Gen 3 capabilities that are required to build IMI. Now, organizations in every
industry benefit from the best practices that have been trusted to run the
public cloud for over a decade.
Problem : IT and OT are Widespread, Delicate, and Vulnerable
The network is at the center of how most organizations serve their customers.
Twenty years ago, IT teams managed a centralized architecture. Achieving
resilience was as simple as going on-site or remoting-in via serial console to
fix issues at the data center.
Then in the mid-2000s, the advent of the cloud decentralized infrastructure, data, and computing. Architectures became geographically and virtually distributed, a complex mix of on-prem and cloud solutions. This allows companies to serve today’s customers, who demand 24/7 reliability and on- demand services for work, school, and leisure.
But behind the scenes, this explosion of architecture created three resilience problems:
-
Too Much Work
Infrastructure, data, and computing are widely distributed. Systems inevitably break and require work, but teams don’t have the staff to keep up. -
Too Much Complexity
Pairing cloud and box-based stacks creates complex networks. Teams leave systems outdated, because they don’t want to break this delicate architecture. -
Too Much Risk
Unpatched, outdated systems are prime targets for packaged attacks that move at machine speed. Defense requires recovery tools that teams don’t have.
Here are a few real-world examples of teams battling these problems in 2023:
- Federal Aviation Administration: An overworked contractor unintentionally deleted files, which delayed flights nationwide for an entire day.
- Southwest Airlines: A firewall configuration change caused 16,000 flight cancellations and cost the company$1 billion.
- MGM Resorts: IT lacked recovery systems, which allowed an attack to persist for weeks and cause millions in losses per day.
Gap: IT Teams Lack Modern Best Practices
These problems exist because teams lack the best practices for modern network
resilience. They must leverage technologies that extend their fleet management
capabilities, automate infrastructure changes, and boost ransomware recovery
efforts. But when it comes to actually implementing any of this, teams face a
mind-boggling question: “How?”
The answer lies in the best practices that Big Tech has trusted to run the public cloud for 10+ years. These best practices are called Isolated Management Infrastructure.
Solution: IMI and the Resilience Platform Approach
Traditional out-of-band serial consoles were designed to solve the problems of 20 years ago. These Gen 1 and Gen 2 devices offered simple remote troubleshooting and limited scripting. These were not enough for Big Tech, who had many global sites, automation-heavy environments, and threat vectors. Their teams still needed to answer practical questions, like:
- “How can we guarantee access to fix stuff that breaks, without rolling trucks?”
- “Can we automate change management, without fear of breaking the network?”
- “Attacks are inevitable — How do we stop hackers from cutting off our access?”
They understood that modern network resilience must account for system breakages, human error, and fast-moving attacks. This requires a platform approach to resilience. They addressed these concerns by creating the blueprint for IMI, which requires:
- Out-of-band management that is fully isolated from production gear and connects to every type of management interface, to guarantee remote access
- Edge-native automation for config and change management, allowing systems to continue operating in case of outages or change errors
- Isolated management interfaces that use zero trust principles, are not open to the internet or production gear, and have full-stack routing with resilience features to ensure access and prevent attackers from reaching the management network
But creating this architecture would require nine or more management devices per site:
This is where ZPE Systems’ Nodegrid resilience platform comes in.
ZPE Systems’ Nodegrid is the Drop-In Resilience Platform
Big Tech was introduced to ZPE Systems during the initial stages of building
IMI. Working together, they further developed the blueprint and the next
generation of out-of-band management. Now, Big Tech and 400+ other
organizations benefit from ZPE’s Nodegrid resilience platform. ZPE’s Gen 3
hardware and Linux-based Nodegrid OS combine all of the functions required for
IMI.
Benefits of IMI using Nodegrid
Instantly fix breakages
Nodegrid’s modular devices connect to serial, Ethernet, USB, and all types of
management interfaces to provide a full virtual presence. This eliminates
truck rolls and prolonged service calls. Teams gain instant remote access to
any device in their fleet, where they can cycle power, re-image the OS, or
fully rebuild production environments.
“We’ve quadrupled business, but this solution is actually shrinking our
workload.” —Blake Johnson, Network Architect, Living Spaces Furniture
Automate without anxiety
The open Nodegrid OS lets teams use any third-party or custom automation
tools. Nodegrid devices have features of content delivery networks and can
pull scripts, patches, firmware, and virtual appliances from ZPE Cloud and
store them locally. This allows them to serve as the automation and deployment
engine for every connected device — including unsupported legacy gear. Since
it’s all done out-of-band, teams can verify automation integrity before
pushing to production, and if there’s a mistake, press the ‘undo’ button to
restore the last good configuration. “Now, we deploy in an hour using one box
instead of five, and it fits into our CI/CD pipeline so well that we can do
proper lights-out management of all our sites.” —Frank Basso, EVP Engineering,
Vapor IO
Stop ransomware in its tracks
Nodegrid’s IMI shields management interfaces from the Internet. If a
production attack succeeds, IT teams retain management control. The Nodegrid
resilience platform runs VMs, apps, and services of choice, so teams can
deploy a Gartner-recommended Isolated Recovery Environment. Using the IRE,
they can isolate infected gear, cleanse, and restore quickly without the risk
of reinfection.
“ZPE has addressed all the aspects of maintaining our uptime to near 100%.”
—Aaron Lott, Network Engineer, DigiCert
For a step-by-step guide on building your Isolated Management Infrastructure,
download our Network Automation Blueprint.
To learn more about ZPE’s customers, IMI use cases, and Nodegrid resilience
products, download the corporate brochure.
Read User Manual Online (PDF format)
Read User Manual Online (PDF format) >>