TMTPOST--OpenAI unveiled GPT-4o mini on Thursday, its latest small AI model. GPT-4o mini is more cost-effective and faster than OpenAI's previous advanced AI models, according to the company.
It is now available for developers, and consumers can access it through the ChatGPT web and mobile apps. Enterprise users will have access next week.
"Pushing the limits of intelligence we can serve for free is part of the quest to make sure AGI benefits all of humanity," ChatGPT said on social media platform X.
OpenAI CEO Sam Altman highlighted the rapid advancements in AI technology, noting that just two years ago, the world’s best model was GPT-3’s text-davinci-003 version, which was significantly less capable and 100 times more expensive than today’s models. In essence, the cost per token for GPT models has dropped by 99% in just two years.
OpenAI says that GPT-4o mini surpasses leading small AI models in reasoning tasks involving text and vision. As the capabilities of small AI models continue to improve, they are gaining popularity among developers for their speed and cost efficiency compared to larger models like GPT-4 Omni or Claude 3.5 Sonnet. These smaller models are particularly useful for high-volume, simple tasks that developers frequently require.
GPT-4o mini will replace GPT-3.5 Turbo as OpenAI's smallest offering. The company claims that its latest AI model scores 82% on the MMLU benchmark, which measures reasoning, compared to 79% for Gemini 1.5 Flash and 75% for Claude 3 Haiku, according to data from Artificial Analysis. On the MGSM benchmark, which assesses mathematical reasoning, GPT-4o mini scored 87%, outperforming Flash at 78% and Haiku at 72%.
Additionally, OpenAI stresses that GPT-4o mini is more affordable to operate than its previous frontier models, being over 60% cheaper than GPT-3.5 Turbo. Currently, GPT-4o mini supports text and vision through the API, and OpenAI plans to add video and audio capabilities in the future.
"To empower every corner of the world with AI, we need to make the models much more affordable," said OpenAI's head of Product API Olivier Godement. "GPT-4o mini is a major step forward in achieving that goal."
For developers using OpenAI's API, GPT-4o mini is priced at For developers using OpenAI's API, GPT-4o mini is priced at $0.15 per million input tokens and $0.60 per million output tokens. The model has a context window of 128,000 tokens, roughly the length of a book, and a knowledge cutoff of October 2023..15 per million input tokens and For developers using OpenAI's API, GPT-4o mini is priced at $0.15 per million input tokens and $0.60 per million output tokens. The model has a context window of 128,000 tokens, roughly the length of a book, and a knowledge cutoff of October 2023..60 per million output tokens. The model has a context window of 128,000 tokens, roughly the length of a book, and a knowledge cutoff of October 2023.
While OpenAI did not disclose the exact size of GPT-4o mini, it mentioned that it is comparable to other small AI models like Llama 3 8b, Claude Haiku, and Gemini 1.5 Flash. However, the company claims that GPT-4o mini is faster, more cost-efficient, and smarter than leading small models, based on pre-launch testing in the LMSYS.org chatbot arena. Early independent tests seem to confirm these claims.
Back in May, OpenAI launched its flagship AI model, GPT-4o, available for free. Mira Murati, the CTO of OpenAI, explained that the "o" in GPT-4o stands for "Omni," indicating its capability as an all-purpose model that can perform real-time audio, visual, and text reasoning. It responds to audio inputs in as little as 232 milliseconds, with an average response time of 320 milliseconds, matching human conversational speeds.
Murati also noted that compared to the GPT-4-turbo released in November last year, GPT-4o is 50% cheaper and twice as fast. At the time, Altman described GPT-4o as the best model OpenAI had ever created, boasting intelligence, speed, and native multi-modality. The GPT-4o voice assistant was made available to all ChatGPT users, including those on the free version.
However, the rollout of the GPT-4o voice assistant faced delays. On June 22, OpenAI announced that the launch, initially planned for late June for a select group of ChatGPT Plus users, was postponed to July. OpenAI attributed the delay to the need for additional safety testing. The company emphasized its commitment to enhancing the model's ability to detect and reject inappropriate content while improving user experience and expanding infrastructure to support millions of users with real-time responses.
Currently, OpenAI faces stiff competition in the small-parameter model field. Competitors like Anthropic and Google release flexible, cost-effective versions of their advanced models, such as Gemma-7B, offering developers more choices. Smaller models can handle basic tasks with lower computational costs, while larger models tackle more complex tasks.
In July, French AI lab kyutai released Moshi, a real-time native multi-modal model trained from scratch in just six months, with performance comparable to GPT-4o. Also, on July 19, Mistral AI launched a 12B small model in collaboration with Nvidia, featuring a 128k context length, marking the beginning of a competitive showdown.
In China, SenseTime introduced the "SenseNova 5o," a real-time, WYSIWYG multi-modal interaction model comparable to GPT-4o. iFlytek's Chairman Liu Qingfeng claimed their Spark large model's voice performance rivals GPT-4o. Around July 16, Alibaba Cloud's Qwen also unveiled a voice assistant similar to GPT-4o.
The future will see GPT-4o mini contending with numerous competitors. However, with major industry players like Apple, Microsoft, Arm, Intel, and Qualcomm joining the fray, small-edge models are poised to become one of the hottest AI model tracks in 2024.
Qiu Xiaoxin, the founder and chairman of Axera Tech, told TMTPost that the application of AI large models on the edge presents a significant opportunity. Qiu emphasized that the implementation of large models on the edge, such as in vehicles, smartphones, and AI PCs, is still in its early stages, described as a "broke force" scenario.
"The application scenarios are diverse. If a 3.2T small chip is integrated into a smartphone chip, the phone can run many applications locally without needing to go to the cloud," Qiu explained. She noted that while the base model of generative AI is cloud-based, the possibility of tuning or optimizing it to create industry-specific large models for edge deployment remains feasible.
GPT-4o mini is part of OpenAI’s effort to stay at the forefront of "multi-modal" technology, providing generative AI capabilities across various media types within a single tool like ChatGPT, according to CNBC.
Liu Zhiyuan, an associate Professor of Computer Science at Tsinghua University and Co-founder and Chief Scientist of Model Best, noted the importance of a cloud-edge collaborative model for future AI large models. He highlighted the value and significance of placing models closer to users for privacy protection and computational efficiency.