GPT -4O -Openai’s latest multi -mode AI model | AI tool set

[ad_1]

What is GPT-4O

GPT-4O is an advanced artificial intelligence model launched by OpenAI. It has a strong multi-mode reasoning ability and can process voice, text and visual information. This model can respond to user input in real time, and detect and express emotions in audio interaction, provides a more natural and expressive communication experience. The design of GPT-4O focuses on increasing the speed of operation and reducing costs. Its speed is twice the previous model, and the cost is only half. GPT-4O performed well in multi-language processing, audio and visual understanding, and strengthened the safety design to ensure the safety of interaction. At present, the text and image functions of this model have been gradually launched in ChatGPT. Users can experience it for free, and audio and video functions will be launched in the future.

GPT-4O

The main function of GPT-4O

  • Multi -mode interaction: GPT-4O can not only handle text, but also process voice and visual information, and can understand and respond to more widely used user input, including real-time video analysis.
  • Real -time dialogue feedback: This model can provide instant response. Whether in text dialogue, voice interaction, or video content analysis, feedback can be given quickly. The response time for audio input is extremely short, with an average of 320 milliseconds, which is similar to the response time to the human conversation.
  • Emotional recognition and simulation: GPT-4O can recognize the emotional state of users and simulate the corresponding emotions in the voice output, making the conversation closer to the natural communication between people.
  • Programming code auxiliary: GPT-4O can analyze and understand code fragments in programming languages ​​to help users understand the functions and logic of code. Users can propose questions about code to GPT-4O through voice. The model responds in the form of voice to explain the working principle of the code or point out potential issues.
  • Multi -language support: GPT-4O supports more than 50 languages, which can serve users around the world and meet the needs of different language environments. In addition, it also supports real -time interpretation of a variety of languages, such as English interpretation as Italian.

GPT-4O performance

  • Text performance evaluation: GPT-4O set a maximum score of 87.2%in the multi-language understanding average (MMLU) benchmark test. The GPQA score ranked first in 53.6%, the Math score 76.6%ranked first, the Humaneval score 90.2%ranked first, and the MGSM score 90.5%ranked second (slightly lower than CLAUDE 3 OPUS). This shows its strong reasoning ability and text processing ability on common sense.
    GPT-4O performance
  • Audio ASR performance: GPT-4O significantly improves the voice recognition performance of all languages ​​than Whisper-V3, especially for language with lack of resources.
    GPT-4O ASR performance
  • Audio translation performance: GPT-4O has reached the new and most advanced level in terms of voice translation, and it is better than Whisper-V3 in the MLS benchmark test, and the SeamlessM4T-V2 and Google’s Gemini that surpasses META.
    GPT-4O audio translation
  • Visual understanding assessment: GPT-4O realizes SOTA’s most advanced performance in the visual perception benchmark, exceeding Gemini 1.0 Ultra, Gemini 1.5 Pro, and Claude 3 Opus.
    GPT-4O visual understanding

The comparison of GPT-4O and GPT-4 Turbo

  • price: The price of GPT-4O is 50%cheaper than GPT-4 Turbo. Specifically, the price of input and output labels (tokens) are input $ 5 per million (m) and $ 15 per million output.
  • Rate restriction: The speed limit of the GPT-4O is 5 times that of GPT-4 Turbo, which can process up to 10 million token per minute.
  • Visual ability: In the evaluation and test related to visual capabilities, the performance of GPT-4O is better than GPT-4 Turbo.
  • Multi -language support: GPT-4O has improved the support of non-English language, providing better performance than GPT-4 Turbo.

At present, the GPT-4O context window is 128K, and the knowledge deadline is October 2023.

How to use GPT-4O

GPT-4O’s text and image functions have begun to be launched in ChatGPT. Users can experience the related functions of GPT-4O on the ChatGPT platform, but the free version has the number of usage limits. Plus users’ message restrictions will be 5 times higher than free users.

At the same time, OpenAI also plans to launch a new version of Voice Mode based on GPT-4O in the next few weeks, which will be provided as an Alpha version of the ChatGPT Plus to Plus users. In addition, GPT-4O will also be provided to developers through API as text and visual models. Developers can use APIs to integrate GPT-4O into their own applications, and GPT-4O is faster, cheaper, and has a higher rate limit than GPT-4 Turbo in API.

As for the audio and video functions of GPT-4O, OpenAI will continue to develop technical infrastructure in the next few weeks and months, improve availability and ensure security after training, and then release these functions, and gradually provide the public with providing the public Essence

Official blog introduction:Hello GPT-4o

© Copyright statement

[ad_2]

Source link

Featured Tools

Clipwise AI Video generation tool!

Leave a Reply

Your email address will not be published. Required fields are marked *