Snowflake lowers inference costs with SwiftKV

Snowflake lowers inference costs with SwiftKV

Snowflake reduces inference costs for Meta’s Llama models by up to 75 percent. This is possible thanks to SwiftKV: a new optimization technology within Snowflake Cortex AI that makes AI inference data retrieval more efficient.

Snowflake claims it can cut inference time and cost for Llama models 50 percent to 75 percent courtesy of a new technique. That technique the company christens SwiftKV, and is baked into CortexAI.

Speed boost

SwiftKV is a storage and lookup technology that increases the speed and efficiency of AI inference within Snowflake. According to Snowflake, this technology reduces the amount of computing power required to run large language models (LLMs) such as Meta’s Llama 2 and Llama 3. This significantly reduces the cost of AI inference.

SwiftKV works by caching frequently used model data smarter and making it accessible faster. This shortens the response time of AI models and reduces the load on the underlying infrastructure. Companies using Llama models within the Snowflake platform can thus build faster and cheaper AI applications.

With hthe Llama 3.3 70B model, Snowflake with SwiftKV sees a decrease in inference cost by half. For Llama 3.1 405B, it is as much as 75 percent.

Essential to the platform

Snowflake keeps expanding its AI functionalities, including through Cortex AI. This is necessary: the company is positioning itself as a data cloud partner, and wants to take care of customers’ data within its own platform. Those who have their data in order today, however, expect to be able to run AI workloads. That is what Snowflake needs to meet as efficiently as possible. Cortex AI allows companies to use machine learning and generative AI within Snowflake without having to manage complex infrastructure.

By adding SwiftKV to Cortex AI, Snowflake is responding to the growing demand for efficient AI solutions. The optimizations for Meta’s Llama models make the platform more attractive to organizations looking to integrate AI within their data environment. Snowflake invariably tries to make solutions in the platform run more efficiently, and previously developed optimizations for its Llama models.