AMD launches first AI model AMD-135M based on speculative decoding

amd

AMD is expanding its market segments with a new AI model “AMD-135M,” aimed at private, enterprise deployments.

AMD announces its first small Llama-based language model AMD-135M, which the company plans to use to expand its market segments. This model is aimed at private, enterprise deployments and works on the basis of speculative decoding. This is a technique in which a smaller “concept model” generates multiple candidate tokens in a single forward pass. The model exists in two versions: AMD-Llama-135M and AMD-Llama135M code. With this, the company looks to appeal to new market segments where competitor Nvidia is not yet present.

Speculative decoding

AMD introduces its first small AI model in a blog post: AMD-135M. According to AMD, this is the first small language model for the Llama family trained from scratch on AMD Instinct MI250 accelerators with 670 billion tokens and divided into two models: AMD-Llama-135M and AMD-Llama-135M code. The model primarily targets private, enterprise deployments.

read also

AMD launches first AI model AMD-135M based on speculative decoding

In addition, the AMD-Llama-135M code was refined with 20 billion additional tokens specifically targeted for coding. It completed this task in four days. AMD’s models are fast since they work with “speculative decoding. The basic principle of this involves using a small design model to generate a set of candidate tokens in a single forward pass. Those tokens are subsequently passed to a “target model” that verifies or corrects them. This allows multiple tokens to be generated simultaneously without sacrificing performance.

AMD believes that further optimizations can lead to even better performance. The company offers an open-source reference implementation, through which it aims to encourage innovation within the AI community.