Hugging Face has announced HUGS, an alternative to Nvidia’s Inference Microservices (NIMs). It allows you to run and deploy AI models on a wide range of hardware.
HUGS, short for Hugging Face Generative AI Services, is based on the open source Text Generation Inference (TGI) and Transformers frameworks. This makes the containers compatible with a variety of hardware, including Nvidia and AMD GPUs. In the future, that may include specialized AI accelerators such as Amazon Inferentia or Google’s TPUs, according to The Register.
HUGS is similar to Nvidia’s NIMs and provides preconfigured container images that are easy to deploy via Docker or Kubernetes. These can be accessed via OpenAI API calls.
Although HUGS uses open source technologies, they are not free. When deployed in AWS or Google Cloud, using HUGS costs around one dollar per container per hour. By comparison, Nvidia charges one dollar per hour per GPU for NIMs in the cloud, or $4,500 per year per GPU for on-premises use. Support for different hardware platforms does give customers more flexibility.
read also
Hugging Face turns heel on Nvidia with alternative to NIMs
Flexibility for smaller users
For smaller deployments, HUGS containers will be available through DigitalOcean at no additional cost for the software. The compute power still needs to be paid for. DigitalOcean recently offered GPU-based VMs based on Nvidia’s H100 accelerators, with prices ranging from $2.5 to $6.74 per hour, depending on the number of GPUs used and the term of the contracts.
Hugging Face will also make the new service available to its Enterprise Hub subscribers. These users will pay $20 per month per user and can implement HUGS on their own infrastructure.
As for supported models, Hugging Face targets these open models for now: Meta Llama 3.1 Mistral Mixtral, Alibaba Qwen 2.5 and Google Gemma 2. The company expects to add additional models in the future, including Microsoft’s Phi series.