Onboarding NVIDIA NVIS Deployed GPU Topology with aarna.ml GPU Cloud Management Software

We, at aarna.ml, recently collaborated with the NVIDIA Infrastructure Specialist (NVIS)  team to onboard and validate a complex metadata topology deployed by NVIS into our aarna.ml GPU Cloud Management Software (CMS). This activity demonstrates how the aarna.ml GPU CMS can take over an NVIS deployed GPU topology and then perform day 1, 2 activities such as discovery, dynamic multi-tenancy, observability, fault management, and more.

The Challenge

The initial deployment and configuration of GPU hardware for NVIDIA Cloud Partners (NCPs) and subsequent management is often done by different entities. Case in point, Day 0 tasks for NCP GPU environments are often performed by NVIS. After NVIS hands over the cluster to the NCP effectively with one single tenant, the task of multi-tenancy and other Day 1, 2 tasks can be performed by the aarna.ml GPU CMS. In other words, there is a hand-off from an NVIS deployed topology to our GPU CMS. NCPs and GPU-as-a-service providers  needed a robust and automated method to:

  • Onboard a topology created by NVIS onto the aarna.ml GPU CMS
  • Validate that the metadata topology files created by the NVIDIA NVIS team after deploying the hardware are correctly and completely onboarded
  • Efficiently provision underlay and overlay network configurations for onboarding infrastructure tenants

Validation & Onboarding with aarna.ml GPU CMS

We validated the successful hand-off of a 16 SU topology deployed by NVIS to aarna.ml GPU CMS. This validation was performed on NVIDIA Air. See details below.

Step-by-Step Workflow:

  1. Metadata Onboarding: Imported the NVIS metadata topology file into aarna.ml GPU CMS.
  2. RA Compliance Validation: Automatically validated the metadata against RA compliance rules. Non-compliance feedback was immediately provided to the user with actionable insights.
  3. Topology Discovery: Dynamically discovered all underlying topology nodes (compute, network, and storage) referenced in the metadata.
  4. Underlay Configuration: Configured network underlay settings for discovered nodes, ensuring base connectivity across the infrastructure.
  5. Tenant Overlay Creation: Built tenant-specific overlay networks, enabling scalable multi-tenant operations on top of the validated infrastructure

Value Add Highlights

Feature Value Delivered
Accurate Onboarding of NVIS deployed topology Reduced manual errors and ensured standards adherence
Automated RA Compliance Successful hand-off from an NVIS deployed topology to aarna.ml GPU CMS
Validation on NVIDIA Air Simulated real-world deployments before going live
Topology Discovery Instant visibility into infrastructure components
Underlay + Overlay Provisioning End-to-end automation of networking configuration
Feedback Loop Fast iterations on fixing metadata issues

Impact

By automating and validating the metadata topology through aarna.ml GPU CMS, the NCPs can achieve:

  • Clean hand-off from NVIS to aarna.ml GPU CMS
  • Faster deployment readiness
  • Improved reliability of infrastructure metadata
  • Streamlined compliance checks, reducing engineering effort

This use case illustrates how aarna.ml GPU CMS can successfully onboard a GPU topology deployed by NVIS. This validation is very important for NCPs as they require a clean hand-off between Day 0 to Day 1,2 activities without any disruptions. If you are an NCP where NVIS has completed the Day 0 tasks and now you are looking for a GPU Cloud Management Software, let’s talk!