UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Наталя ХандусенкоAI Eng
28 January 2025, 10:04
2025-01-28
China's Alibaba Qwen has released a competitor to OpenAI's Operator AI agent, which can control a PC and phone
Alibaba's AI division Qwen, which is DeepSeek's main domestic competitor, has released a new family of AI models, Qwen2.5-VL. These models can analyze files, understand videos, count objects in images, and even control a computer — similar to OpenAI's Operator AI agent. Of course, the AI has some limitations on the topics it's allowed to discuss.
Alibaba's AI division Qwen, which is DeepSeek's main domestic competitor, has released a new family of AI models, Qwen2.5-VL. These models can analyze files, understand videos, count objects in images, and even control a computer — similar to OpenAI's Operator AI agent. Of course, the AI has some limitations on the topics it is allowed to discuss.
According to the results of a comparative analysis conducted by the Qwen team, the best model Qwen2.5-VL outperforms OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, and Google's Gemini 2.0 Flash in various indicators of video understanding, mathematics, document analysis, and question answer evaluation, TechCrunch writes .
Source: TechCrunch
Qwen2.5-VL is available for testing on Alibaba’s Qwen Chat app and for download from the Hugging Face AI developer platform. It can analyze charts and graphs, extract data from scanned invoices and forms, and “understand” hours of video, the Qwen team says. It can also recognize “IP addresses from movies and TV shows, as well as a wide range of products,” the team says, suggesting the models may have been trained in part on copyrighted works.
Qwen2.5-VL, like any Chinese AI, has some limitations on the topics it can discuss. When a TechCrunch reporter asked the largest and most powerful model in the family, Qwen2.5-VL-72B, to talk about “Xi Jinping’s mistakes,” Qwen Chat gave an error message.
One of the most interesting features of Qwen2.5-VL is its ability to interact with software — both on PCs and mobile devices. A video posted on X by Philipp Schmid, CTO of Hugging Face, showed Qwen2.5-VL launching the Booking.com Android app and booking a flight from Chongqing to Beijing.
Don't Miss @Alibaba_Qwen 2.5 VL! Despite all the Deepseek Hype, Qwen just dropped the best open Multimodal! Qwen 2.5 VL is a Vision Language Model that can control your computer, similar to the @OpenAI operator, extract structured information from charts, and more!
In the video below, the Qwen2.5-VL runs applications on a Linux desktop, but doesn't appear to do anything other than switch tabs. Perhaps tellingly, the Qwen2.5-VL scored poorly in OSWorld's Qwen benchmark, a test that attempts to simulate a real-world computing environment.
LMAO Qwen 2.5 VL can perform Computer Use, out of the box, taking on OpenAI Operator HEAD ON! 🐐 pic.twitter.com/lwMECXzNSu
Two less sophisticated models of the Qwen2.5-VL series, the Qwen2.5-VL-3B and Qwen2.5-VL-7B, are available under a permissive license. The flagship model, the Qwen2.5-VL-72B, has a special Alibaba license, which requires companies and developers with more than 100 million monthly active users to request permission from Qwen/Alibaba before deploying the model commercially.