Beyond the Barebones: The aarna.ml GPU CMS RA 2.0 for Full-Stack AI Cloud Services

In our previous post, we laid out the foundational blueprint for building a true, on-demand multi-tenant AI cloud. We detailed how the aarna.ml GPU Cloud Management Software (CMS) automates the complex orchestration of infrastructure (IaaS) and platforms (PaaS) to create a hyperscaler-grade service. Today, we’re taking the next crucial step.

We are excited to announce our Reference Architecture v2.0, a significant evolution focused on two key areas our customers have identified as critical: empowering tenants with ultimate network control and providing a comprehensive framework for maximizing the ROI of every single GPU.

Empowering Tenants with Hyperscaler-Grade Networking

True multi-tenancy requires more than just isolated compute; it demands that tenants have control over their own secure network environments, just as they would in their own data center. Our latest RA details how the aarna.ml GPU CMS platform now empowers tenants to:

  • Create their own Virtual Private Clouds (VPCs) and subnets on-demand through our self-service portal.
  • Configure isolated overlay networks using technologies like VXLAN, ensuring their resources are completely segregated from other tenants.
  • Manage external connectivity through secure border leaf gateways, with the CMS automating the complex underlying BGP-EVPN and NAT configurations.

This empowers your customers with a true hyperscaler-like experience, allowing them to build complex application topologies securely and independently, while your operations remain fully automated.

The Framework for Maximizing GPU ROI

The primary objective of our platform is to empower NVIDIA Cloud Partners and AI Neocloud providers to achieve the maximum possible return on significant hardware investments. Our framework addresses this with a multi-pronged strategy:

  1. Driving Maximum GPU Utilization: A key challenge is that tenants often request more resources than they actively use, leading to valuable GPUs sitting idle. Our platform solves this by enabling GPU virtualization and oversubscription. Using technologies like NVIDIA MIG and advanced time-slicing, a single physical GPU can be partitioned to serve multiple smaller workloads, dramatically increasing its utilization rate. Our CMS acts as the policy engine; it translates business priorities into pod specifications that our integrated, first-party scheduler based on KAI (open source Run:ai) used to enforce fair-share and preemption rules, ensuring every cycle is used efficiently.
  2. Moving Up the Value Chain to AI PaaS: The highest margins come from offering high-value platform services. The aarna.ml GPU CMS provides the tools to build a rich PaaS offering, including "Slurm-as-a-Service" for HPC workloads, Fine-Tuning-as-a-Service (FTaaS), and Model-Inference-as-a-Service (MIaaS) with one-click deployment of NVIDIA NIMs and Hugging Face models.

Monetizing Idle Capacity with External Marketplaces: Even in a well-managed cluster, there will be unused capacity. Our platform allows you to turn this idle hardware into a revenue stream. With a few clicks, an administrator can securely partition a set of GPU nodes, isolate them from primary tenant infrastructure, and register them with third-party marketplaces like NVIDIA Cloud Functions (NVCF) and NVIDIA DGX Lepton™ Cloud. This is only possible because of our platform’s foundational Seven pillarhard isolation, which guarantees that marketplace workloads are completely segregated from your primary tenants.

With these powerful new capabilities, the aarna.ml GPU CMS provides not just a blueprint for building an AI cloud, but a comprehensive platform for operating and monetizing it at scale.

Read the new Reference Architecture v2.0 to learn more!