Meta is releasing lightweight versions of their Llama 3.2 models. These are designed for low-power devices.
Meta aims to make its open-source large language models 3.2 1B and Llama 3B more popular with these lightweight versions, specially designed for low-power devices. The models can run on energy-efficient sources, and still deliver strong performance.
Quantized models
Meta’s AI team emphasized that the models are designed for “short-context applications up to 8K,” due to the limited memory on mobile devices. Quantizing the language models reduces their size by adjusting the precision of their model weights.
The developers there used two different methods, including “Quantization-Aware Training with LoRA adaptors.” QLoRA helps optimize performance in low-precision environments. If the model is more likely to focus on portability at the expense of performance, SpinQuant can be used. That optimizes compression to simplify model transfer to different devices.
Meta, in collaboration with Qualcomm and MediaTek, has optimized the models for Arm-based system-on-chip hardware. The optimization with Kleidi AI kernels allows the models to run on mobile CPUs. This enables more privacy-friendly AI applications, with all processes running locally on the device.
Quantized Llama 3.2 1B and Llama 3B models are available for download from Llama.com and Hugging Face starting today. Meta also launched video editing models earlier this week.
read also