AMD introduces Instella, a family of fully open language models with 3 billion parameters. The models are trained on AMD Instinct MI300X GPUs and, according to AMD, perform better than existing fully open models of comparable size.
AMD introduces Instella. Instella is a series of three billion-parameter language models, fully trained on AMD hardware. According to AMD, the models not only perform better than existing fully open models but also compete with open-weight models such as Llama-3.2-3B, Gemma-2-2B, and Qwen-2.5-3B. AMD makes the model weights, training configurations, datasets, and code openly available to promote collaboration within the AI community.
Instella builds upon AMD’s previous one billion-parameter models, which were trained on Instinct MI250 GPUs. With Instella, the model has been scaled up and trained with 4.15 trillion tokens on 128 MI300X GPUs. This demonstrates the scalability of AMD’s hardware in large-scale AI training. This is important, as for many, Nvidia is still synonymous with AI hardware. Announcements like these help position AMD’s accelerators in the market as an interesting alternative.
Different Models
Instella consists of different versions: a base pre-training model, a refined version, and models with supervision and instruction tuning. The models support a sequence length of up to 4,096 tokens and are optimized for efficiency using techniques such as FlashAttention-2 and Fully Sharded Data Parallelism.
In benchmark tests conducted by AMD, Instella-3B outperforms other fully open models and comes closer to the performance of closed and open-weight models. It shows particularly strong results in workloads such as MMLU and GSM8K.
By making Instella open, AMD aims to contribute to AI research and development. The company plans further improvements, including expanding the context length and multimodal functionalities.