Inside DeepSeek-AI stirs minds: how does it work, and what’s stolen?

deepseek

The DeepSeek R1 AI model is receiving high praise for its performance at a low training cost. Several details surrounding its operation under the hood are now becoming known, undermining Nvidia’s role in the AI ecosystem. Meanwhile, questions also arise about the true openness of the model, and possible creative use of GPT-4 during training.

Chinese AI start-up Scale AI has surprised the tech world with the breakthrough of its R1 AI model. That delivers comparable performance to OpenAIs at a much lower cost. This is possible through clever optimizations and a special programming method. Along with praise, there is also criticism: did ScaleAI develop the model itself or did it take the mustard elsewhere?

How does DeepSeek’s AI model work?

DeepSeek uses Mixture-of-Experts (MoE), with 671 billion parameters, trained on barely 2048 modest Nvidia H800 GPUs. Moreover, the company did not use Nvidia CUDA, but rather its own DeepSeek PTX (Parallel Thread Execution). That is a midlevel programming language that allows fine-grained optimizations for the GPU. For example, 20 of the 132 streaming processors in the Nvidia H800 were used for communication between servers. This allows R1 to work faster and more efficiently.

That way, it can deliver similar performance to larger AI players, with development costs of only $5.6 million. American companies often invest billions in their AI models, an approach whose necessity is now strongly questioned.

That even CUDA is no longer essential is also not to be underestimated. Nvidia has a strong grip on AI development thanks to its own CUDA ecosystem. CUDA is a kind of monopoly that competitors want to break, but without very much success. DeepSeek shows here that alternative ones work, too.

read also

Inside DeepSeek-AI stirs minds: how does it work, and what’s stolen?

Has Scale AI committed plagiarism?

There are questions, however, about the method behind DeepSeek’s training. OpenAI claims that DeepSeek used “distillation,” according to the Financial Times. That’s a method that is commonly used, where a smaller AI model learns from the output of a larger and more capable model. DeepSeek allegedly used data from GPT-4, according to OpenAI, violating OpenAI’s terms of use.

So distillation is not new, and according to insider, it is common for AI labs to use results from AI companies such as OpenAI. But when exactly are you committing plagiarism? Large companies like OpenAI invest a lot of time and money in improving their AI models with human feedback. When other companies use this improved output to build their own models, they gain an edge without putting in the same effort.

Pot vs. Boiler

On the other hand, OpenAI committed the greatest heist of intellectual property in human history with the training of its GPT models. Indeed, for that, the company collected data from all over the Internet without regard to copyright protection. ChatGPT exists by grace of what journalists, researchers, bloggers and simply active Internet users have posted on the Web over the years.

If DeepSeek is indeed trained with the help of a larger model, it shows that it is still necessary to develop such large starting models. In that case, the total cost of developing Deepseek should be added together with the price of previously developed models. If OpenAI’s allegations are true, then DeepSeekr’s claims do belong with a hefty asterisk. OpenAI can also pull the card of intellectual property and terms of use, although that sounds a bit like a pot calling a kettle black.

More openness

Meanwhile, the open-source site HuggingFace wants to reproduce the R1 model, with the goal of making a fully open-source version of the model available to the AI community. According to HuggingFace, DeepSeek is not fully open-source because much of the data and code are not publicly available. While the model is free to use, it is not completely open. Therefore, the company wants to create an even more transparent and accessible environment.

Regardless, DeepSeek has caused a landslide in AI land. Even with its caveats about its openness, and even if OpenAI’s claims are true, the Chinese AI system shows that the way forward for AI-LLM development can be more efficient than previously thought. Today, Scale AI reinforces that claim with the launch of Janus-Pro-7B. That model can generate images analogous to Dall-E and Stable Diffusion. Again, the model does not seem to be inferior to the much more expensive alternatives.