Meta Launches Llama 4: Team of Experts for Greater Efficiency

Meta Launches Llama 4: Team of Experts for Greater Efficiency

Meta launches Llama 4 with two models: Maverick and Scout. The new LLMs, like their predecessors, are designed to power smart chatbots and generative AI assistants.

Meta expands its Llama ecosystem with the introduction of Llama 4 Scout and Llama 4 Maverick. Both models are trained using a mixture-of-experts (MoE) architecture, where only a portion of the parameters are active for a given input. Only part of the model works on specific input instead of the entire model. Think of the classic representation of the brain, divided into specialized zones. This approach makes the models more efficient in use, without compromising performance.

Llama 4 Scout is designed for general applications and has seventeen billion active parameters distributed across sixteen ‘experts’. The model can work with a context window of up to 10 million tokens. This makes it suitable for applications such as extensive summaries and code analysis.

More Experts, More Efficiency

Llama 4 Maverick also contains seventeen billion active parameters, but distributed across 128 experts. The model achieves performance comparable to larger models like DeepSeek v3, but with fewer active parameters. Meta positions it as a model for advanced assistant tasks, including reasoning, coding, and multimodal interaction.

Both models fit on a single Nvidia H100 GPU. This is important for the efficient deployment of the models for inference in data centers.

Behemoth

As a foundation for these models, Meta also developed Llama 4 Behemoth. This is a model with 288 billion active parameters. Behemoth is still in training and is used as a learning model for the smaller Llama 4 models. Behemoth outperforms GPT-4.5 and Claude Sonnet 3.7 on STEM-related benchmarks, according to Meta.

Meta emphasizes that the Llama 4 models are built with safety in mind. The company offers tools such as Llama Guard and Prompt Guard to help developers detect unwanted input and output. Furthermore, the red-teaming process has been expanded with Generative Offensive Agent Testing (GOAT) to better map potential risks.

Llama 4 Scout and Maverick are now available via llama.com and Hugging Face. The models are (more or less) open source available. Anyone can work with them, but commercial entities with more than 700 million monthly active users must request permission from Meta.

Meta plans further support through cloud platforms and partners. The models are already running in WhatsApp, Messenger, and Instagram Direct, but mainly outside Europe. In the US, AI implementation in Meta services is already ubiquitous, and Meta is working on the broader (unsolicited) rollout of AI in WhatsApp in the EU.