Snowflake Runs Apache Spark Code without a Cluster

Snowflake Runs Apache Spark Code without a Cluster

The new connector allows Spark workloads to be executed directly in Snowflake.

Snowflake introduces the Snowpark Connector, enabling users to run Apache Spark code directly in the cloud data warehouse without setting up a separate cluster.

Faster and Cheaper

The feature uses Spark Connect, a client-server architecture that can connect client applications to remote Spark clusters. Chris Child, VP of Product Management at Snowflake, told The Register that the solution performs on average 5.6 times faster and delivers approximately 40 percent cost savings compared to traditional Spark environments, with the same code and data.

Thanks to Snowflake’s optimized vector engine, users do not have to worry about dependencies, versions, or upgrades. All modern Spark DataFrame, Spark SQL, and user-defined code are supported.

Data Warehouses and Data Lakes in One

The move helps data lake and data warehouse platforms grow towards each other. Competitor Databricks, originally built around Spark, does the same with its “lakehouse” concept, while Snowflake increasingly adds lake functionality.

“We have invested in Snowpark Connect to let people use code as they wish,” said Child. Recently, the company introduced a revamped analytics approach with Cortex AISQL and SnowConvert AI. It also aims to make Snowflake AI more accessible for data scientists.