Наталя Хандусенко AI Eng 16 January 2025, 09:41

Chinese AI startup MiniMax has introduced three new artificial intelligence models: will they be able to compete with Western counterparts?

AI startup MiniMax, backed by Alibaba and Tencent, has launched three new models: MiniMax-Text-01 — for text only, MiniMax-VL-01 can understand both images and text, T2A-01-HD generates sound, including speech. The Chinese claim that all of them are better than AI models from Google and Anthropic.

Leave a comment

Chinese AI startup MiniMax has introduced three new artificial intelligence models: will they be able to compete with Western counterparts?

AI startup MiniMax, backed by Alibaba and Tencent, has launched three new models: MiniMax-Text-01 — for text only, MiniMax-VL-01 can understand both images and text, T2A-01-HD generates sound, including speech. The Chinese claim that all of them are better than AI models from Google and Anthropic.

MiniMax-Text-01 has 456 billion parameters. The startup claims that this model performs better than Google’s recently introduced Gemini 2.0 Flash in benchmarks such as MMLU and SimpleQA, which measure the model’s ability to answer mathematical and fact-based questions, TechCrunch reports .

It is worth noting that MiniMax-Text-01 has an extremely large context window, which refers to the input data (e.g., text) that the model considers before generating output data (additional text). With a context window of 4 million tokens, MiniMax-Text-01 can analyze about 3 million words at a time—or a little more than five copies of War and Peace. MiniMax-Text-01’s context window is about 31 times larger than that of GPT-4o and Llama 3.1.

As for MiniMax-VL-01 , MiniMax claims that the model competes with Anthropic’s Claude 3.5 Sonnet on assessments that require multimodal understanding, such as ChartQA, which tasks models with answering queries related to graphs and charts (e.g., “What is the peak value of the orange line on this graph?”). Of course, MiniMax-VL-01 doesn’t quite outperform Gemini 2.0 Flash on many of these tests. OpenAI GPT-4o and the open-source InternVL2.5 model also outperform it on some tests.

The latest MiniMax model released this week, the T2A-01-HD, is an audio generator optimized for speech. The T2A-01-HD can generate synthetic voice with adjustable frequency, pitch, and tenor in about 17 different languages, including English and Chinese, and can also clone a voice from a 10-second audio recording.

MiniMax hasn't published any comparative testing results for the T2A-01-HD against other audio-generating models, but TechCrunch believes the T2A-01-HD's sound is on par with audio models from Meta and startups like PlayAI.

With the exception of the T2A-01-HD, which is exclusively available through the MiniMax API and Hailuo AI platform, new MiniMax models can be downloaded from GitHub and the Hugging Face AI development platform.

However, just because the models are “openly” available does not mean that they are not closed in some respects. MiniMax-Text-01 and MiniMax-VL-01 are not truly open in the sense that MiniMax has not released the components (e.g., training data) needed to recreate them from scratch. Moreover, they are under the restrictive MiniMax license, which prohibits developers from using the models to improve competitors’ AI models and requires platforms with more than 100 million monthly active users to request a special license from MiniMax.

The Chinese have launched one of the most powerful open AI models, DeepSeek V3, which works well with code but is not very willing to answer questions about the developer's country.

French AI startup Mistral AI has updated its code generation model: Codestral 25.01 scored an astonishing 866% in the HumanEval test, outperforming its competitors

Elon Musk's AI bot Grok has become a separate application