🚀 Trustee Plus - картка європейського банку і криптогаманець. Встанови додаток 👉
Наталя ХандусенкоAI Eng
27 December 2024, 16:05
2024-12-27
The Chinese have launched one of the most powerful open AI models, DeepSeek V3, which works well with code but is not very willing to answer questions about the country of the developer
Chinese company DeepSeek has unveiled its new open AI model, DeepSeek V3, which appears to outperform its American competitors.
DeepSeek V3 can handle tasks such as coding, translation, essay writing, and email writing based on prompts, writes TechCrunch.
According to DeepSeek’s internal benchmarking, the new model outperforms both downloadable, “openly” available models and “closed” AI models that can only be accessed via APIs. In a series of coding competitions on the Codeforces platform, DeepSeek has outperformed other models, including Meta’s Llama 3.1 405B, OpenAI’s GPT-4o, and Alibaba’s Qwen 2.5 72B.
DeepSeek V3 also outperforms competitors in the Aider Polyglot test, designed, among other things, to measure whether a model can successfully write new code that integrates into existing code.
The company claims that DeepSeek V3 was trained on a dataset of 14.8 trillion tokens (1 million tokens equals approximately 750,000 words).
It’s not just the training set that’s massive. DeepSeek V3 is massive: 671 billion parameters (parameters are internal variables that models use to make predictions or decisions). That’s about 1.6 times larger than Llama 3.1 405B, which has 405 billion parameters.
The number of parameters often (but not always) correlates with skill; models with more parameters tend to outperform models with fewer parameters. But larger models also require more powerful hardware to run. An unoptimized version of DeepSeek V3 would need a set of high-end GPUs to answer questions at a reasonable speed.
DeepSeek was able to train the model using a data center with Nvidia H800 GPUs in just two months—GPUs that were recently banned by the US Department of Commerce from Chinese companies. The company also claims to have spent just $5.5 million training DeepSeek V3, a fraction of the cost of developing models like OpenAI’s GPT-4.
The downside is that the model’s political views are a bit… lame. Ask DeepSeek V3 about Tiananmen Square, for example, and it won’t answer.
DeepSeek, as a Chinese company, is subject to benchmarking by China’s internet regulator to ensure that its models’ responses “embody core socialist values.” Many Chinese AI systems refuse to respond to topics that could anger regulators, such as speculation about Xi Jinping’s regime.