We, at aarna.ml, recently collaborated with the NVIDIA Infrastructure Specialist (NVIS) team to onboard and validate a complex metadata topology deployed by NVIS into our aarna.ml GPU Cloud Management Software (CMS). This activity demonstrates how the aarna.ml GPU CMS can take over an NVIS deployed GPU topology and then perform day 1, 2 activities such as discovery, dynamic multi-tenancy, observability, fault management, and more.
The Challenge
The initial deployment and configuration of GPU hardware for NVIDIA Cloud Partners (NCPs) and subsequent management is often done by different entities. Case in point, Day 0 tasks for NCP GPU environments are often performed by NVIS. After NVIS hands over the cluster to the NCP effectively with one single tenant, the task of multi-tenancy and other Day 1, 2 tasks can be performed by the aarna.ml GPU CMS. In other words, there is a hand-off from an NVIS deployed topology to our GPU CMS. NCPs and GPU-as-a-service providers needed a robust and automated method to:
- Onboard a topology created by NVIS onto the aarna.ml GPU CMS
- Validate that the metadata topology files created by the NVIDIA NVIS team after deploying the hardware are correctly and completely onboarded
- Efficiently provision underlay and overlay network configurations for onboarding infrastructure tenants
Validation & Onboarding with aarna.ml GPU CMS
We validated the successful hand-off of a 16 SU topology deployed by NVIS to aarna.ml GPU CMS. This validation was performed on NVIDIA Air. See details below.
Step-by-Step Workflow:
- Metadata Onboarding: Imported the NVIS metadata topology file into aarna.ml GPU CMS.
- RA Compliance Validation: Automatically validated the metadata against RA compliance rules. Non-compliance feedback was immediately provided to the user with actionable insights.
- Topology Discovery: Dynamically discovered all underlying topology nodes (compute, network, and storage) referenced in the metadata.
- Underlay Configuration: Configured network underlay settings for discovered nodes, ensuring base connectivity across the infrastructure.
- Tenant Overlay Creation: Built tenant-specific overlay networks, enabling scalable multi-tenant operations on top of the validated infrastructure
Value Add Highlights
Impact
By automating and validating the metadata topology through aarna.ml GPU CMS, the NCPs can achieve:
- Clean hand-off from NVIS to aarna.ml GPU CMS
- Faster deployment readiness
- Improved reliability of infrastructure metadata
- Streamlined compliance checks, reducing engineering effort
This use case illustrates how aarna.ml GPU CMS can successfully onboard a GPU topology deployed by NVIS. This validation is very important for NCPs as they require a clean hand-off between Day 0 to Day 1,2 activities without any disruptions. If you are an NCP where NVIS has completed the Day 0 tasks and now you are looking for a GPU Cloud Management Software, let’s talk!