Cloudflare has announced new upgrades to its serverless AI platform Workers AI. The platform now supports larger AI models, faster inference and improved vector database functionality, making it easier to create and scale AI applications.
Workers AI, a platform for building AI applications, now has access to more powerful GPUs in more than 180 cities worldwide. This provides lower network latency, which is especially important when using large language models (LLMs). The expansion of GPUs allows for faster processing of larger models, such as Llama 3.1 and the Llama 3.2 series. This makes for more efficient AI apps that can perform more complex tasks, resulting in seamless experiences for end users.

Cloudflare argues that network speeds are becoming critical as AI becomes more prevalent in everyday life. The wide availability of GPUs worldwide makes the platform suitable for users around the world, greatly improving the accessibility and performance of AI applications. The introduction of faster response times and larger context windows makes interactions with AI more fluid.
read also
Cloudflare improves AI inference platform with faster performance and larger models
Improved control and affordable searches
Cloudflare has also made improvements in managing and optimizing AI apps. Permanent logs in AI Gateway allow developers to analyze prompts and model responses to optimize performance. Since the launch of AI Gateway, more than two billion requests have already been processed.
In addition, the vector database Vectorize is now generally available, with support for indexes up to five million vectors. This dramatically reduces query latency, from 549 milliseconds to 31 milliseconds. These optimizations make AI applications more efficient and cheaper to run.
