Scattered Clouds
clouds

18 April 2024

Amman

Thursday

71.6 F

22°

Home / Gotcha

OpenAI’s new reasoning AI models hallucinate more

20-04-2025 08:59 AM


Ammon News - OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of OpenAI’s older models.

Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even today’s best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesn’t seem to be the case for o3 and o4-mini.

According to OpenAI’s internal tests, o3 and o4-mini, which are so-called reasoning models, hallucinate more often than the company’s previous reasoning models — o1, o1-mini, and o3-mini — as well as OpenAI’s traditional, “non-reasoning” models, such as GPT-4o.

Perhaps more concerning, the ChatGPT maker doesn’t really know why it’s happening.

In its technical report for o3 and o4-mini, OpenAI writes that “more research is needed” to understand why hallucinations are getting worse as it scales up reasoning models. O3 and o4-mini perform better in some areas, including tasks related to coding and math. But because they “make more claims overall,” they’re often led to make “more accurate claims as well as more inaccurate/hallucinated claims,” per the report. TechCrunch




No comments

Notice
All comments are reviewed and posted only if approved.
Ammon News reserves the right to delete any comment at any time, and for any reason, and will not publish any comment containing offense or deviating from the subject at hand, or to include the names of any personalities or to stir up sectarian, sectarian or racial strife, hoping to adhere to a high level of the comments as they express The extent of the progress and culture of Ammon News' visitors, noting that the comments are expressed only by the owners.
name : *
email
show email
comment : *
Verification code : Refresh
write code :