Are AI hallucinations caused by flawed incentive structures?

A recent study by OpenAI explores the reasons why advanced language models such as GPT-5 and conversational agents like ChatGPT continue to produce hallucinations, and investigates possible ways to lessen these occurrences.

In a blog post outlining the study, OpenAI describes hallucinations as “statements that sound credible but are actually false, generated by language models.” The company concedes that, despite progress, hallucinations “remain an inherent problem for all major language models” and are unlikely to ever disappear entirely.

To make this issue clear, the paper notes that when researchers asked “a widely used chatbot” for the title of Adam Tauman Kalai’s doctoral thesis, it offered three different – and incorrect – responses. (Kalai is a co-author of the study.) When questioned about his date of birth, the chatbot again provided three dates, none of which were accurate.

Why do chatbots make such confident but incorrect statements? According to the researchers, one reason is the way these models are pretrained: they are taught to guess the next word in a sequence without any indication of truth or falsehood, only being exposed to fluent language: “The model is only shown correct examples of language and must estimate the full range of possible outputs.”

“Spelling and use of parentheses are predictable, so these kinds of mistakes vanish as models scale,” the authors explain. “However, facts that occur infrequently, like a pet’s birthday, can’t be deduced from language patterns alone and thus result in hallucinations.”

Nonetheless, the paper’s recommended fix centers less on changing pretraining and more on how language models are assessed. It claims that while current evaluation methods don’t directly cause hallucinations, they “create unhelpful incentives.”

The researchers liken these assessments to multiple-choice exams, where guessing is logical because “you might get the right answer by chance,” whereas leaving it blank “always results in zero.”

“In a similar fashion, if models are rated solely on accuracy—the proportion of questions answered perfectly—they are pushed to make guesses instead of admitting ‘I don’t know,’” the team points out.

The suggested approach draws inspiration from tests like the SAT, which “deduct points for incorrect answers or give partial marks for unanswered questions to prevent random guessing.” OpenAI proposes that model evaluations should “punish confident mistakes more than expressions of uncertainty, and award some credit for appropriate uncertainty.”

The authors further stress that merely adding “a handful of uncertainty-sensitive tests” isn’t enough. Rather, “the prevailing accuracy-based evaluation frameworks must be revised so their scoring systems discourage guessing.”

“As long as the main leaderboards continue to reward lucky guesses, models will continue to be incentivized to guess,” the study concludes.

Are AI hallucinations caused by flawed incentive structures?

You may also like

Trending news

Crypto prices