Are you a data center provider, telco, NVIDIA Cloud Partner (NCP) or startup that has decided to offer a GPU-as-a-Service (GPUaaS) AI cloud? You need to rapidly decide what your offering is going to look like. With multiple technical options, the ultimate decision depends on your customer requirements, the type of competition you are facing and your desired differentiation in an increasingly commoditized service.
Some first level decision points are whether your offering will be Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) or Software-as-a-Service (SaaS). Of course, these are not mutually exclusive, you may choose to offer a combination. Let’s dig into some details.
IaaS
IaaS largely means offering compute instances with GPUs to end users. This is probably the most common offering today. The sizing of these instances will vary based on the GPU capability, vCPU count, memory and storage sizing, and network throughput. Even with IaaS, there are some sub-options:
- BMaaS or Bare-Metal-as-a-Service. A server like NVIDIA HGX or MGX could be offered as a service with a simple operating system. The benefit for a user is to be able to get the instance on-demand using a self service mechanism. The user has full control of that bare metal server and can release the instance when they are done, without incurring any CAPEX.
- VM: If your customers need instances smaller than a single bare metal server (e.g. for inferencing), you will need to turn to virtualization. With virtual machines, you can offer fractional servers. With the VMware cost increases and OpenStack increasingly becoming a legacy technology, your choice is realistically limited to Kubernetes (see Kata containers).
- Clustered instances: If your customers are interested in model training, then they will need multiple GPUs clustered into a single instance. For example, multiple HGX servers will have to be clustered together and offered as a single instance to your customers.
Of course with IaaS, you will encounter challenges like multi-tenancy and isolation, self-service APIs, and on-demand billing that will need to be solved to be able to offer a complete solution to customers.
PaaS
With PaaS, complexities of the underlying infrastructure are hidden and the offering is a higher level abstraction. The options range from a GPU based Kubernetes cluster optimized to run NVIDIA NIM, LLMOps/MLOps, fine-tuning-as-a-Service, vector-database-as-as-Service, GPU spot instance creation (to sell excess unused capacity), among other services. A move from IaaS to PaaS instantly creates more value around your offering but requires additional technical sophistication and instrumentation.
SaaS
The next level of sophistication is to offer managed software directly to users in the form of SaaS. This could include LLM-as-a-Service (similar to what OpenAI and the hyperscalers provide), RAG-as-a-Service, and more. This layer adds even more value than IaaS or PaaS.
To compete you will need to move up the value chain, leaving the low level “boring” infrastructure orchestration & management to Aarna.ml so that you can focus on building your differentiation.The Aarna Multi Cluster Orchestration Platform (AMCOP) orchestrates and manages low level infrastructure to achieve network isolation, Infiniband isolation, GPU/CPU configuration, OS and Kubernetes orchestration, storage configuration and more. Once the initial orchestration is complete, AMCOP monitors and manages the infrastructure as well. If you would like to slash your time-to-market, and build a differentiated and sustainable GPUaaS please get in touch with us for an initial 3-day architecture and strategy assessment.