{"id":24056,"date":"2023-10-24T19:15:15","date_gmt":"2023-10-24T19:15:15","guid":{"rendered":"https:\/\/nftandcrypto-news.com\/crypto\/humans-and-ai-often-prefer-sycophantic-chatbot-answers-to-the-truth-study\/"},"modified":"2023-10-24T19:15:17","modified_gmt":"2023-10-24T19:15:17","slug":"humans-and-ai-often-prefer-sycophantic-chatbot-answers-to-the-truth-study","status":"publish","type":"post","link":"https:\/\/nftandcrypto-news.com\/crypto\/humans-and-ai-often-prefer-sycophantic-chatbot-answers-to-the-truth-study\/","title":{"rendered":"Humans and AI often prefer sycophantic chatbot answers to the truth \u2014 study"},"content":{"rendered":"
\n

Artificial intelligence (AI) large language models (LLMs) built on one of the most common learning paradigms have a tendency to tell people what they want to hear instead of generating outputs containing the truth. This, according to a study from Anthropic AI.\u00a0<\/p>\n

In one of the first studies to delve this deeply into the psychology of LLMs, researchers at Anthropic have determined that both humans and AI prefer so-called sycophantic responses over truthful outputs at least some of the time.<\/p>\n

Per the team\u2019s research paper:<\/p>\n

\u201cSpecifically, we demonstrate that these AI assistants frequently wrongly admit mistakes when questioned by the user, give predictably biased feedback, and mimic errors made by the user. The consistency of these empirical findings suggests sycophancy may indeed be a property of the way RLHF models are trained.\u201d<\/p><\/blockquote>\n

In essence, the paper from Anthropic indicates that even the most robust AI models are somewhat wishy-washy. During the team\u2019s research, time and again, they were able to subtly influence AI outputs by wording prompts with language the seeded sycophancy.<\/p>\n

\n

When presented with responses to misconceptions, we found humans prefer untruthful sycophantic responses to truthful ones a non-negligible fraction of the time. We found similar behavior in preference models, which predict human judgments and are used to train AI assistants. pic.twitter.com\/fdFhidmVLh<\/a><\/p>\n

\u2014 Anthropic (@AnthropicAI) October 23, 2023<\/a><\/p><\/blockquote>\n

In the above example, taken from a post on X, a leading prompt indicates that the user (incorrectly) believes that the sun is yellow when viewed from space. Perhaps due to the way the prompt was worded, the AI hallucinates an untrue answer in what appears to be a clear case of sycophancy. <\/p>\n

Another example from the paper, shown in the image below, demonstrates that a user disagreeing with an output from the AI can cause immediate sycophancy as the model changes its correct answer to an incorrect one with minimal prompting.<\/p>\n

Examples of sycophantic answers in response to human feedback. Image source: Sharma, et. al., 2023.<\/figcaption><\/figure>\n

Ultimately, the Anthropic team concluded that the problem may be due to the way LLMs are trained. Because they use datasets full of information of varying accuracy \u2014 eg., social media and internet forum posts \u2014 alignment often comes through a technique called reinforcement learning from human feedback (RLHF). <\/p>\n

In the RLHF learning paradigm, humans interact with models in order to tune their preferences. This is useful, for example, when dialing in how a machine responds to prompts which could solicit potentially harmful outputs such as personally identifiable information or dangerous misinformation. <\/p>\n

Unfortunately, as Anthropic\u2019s research empirically shows, both humans and AI models built for the purpose of tuning user preferences tend to prefer sycophantic answers over truthful ones, at least a \u201cnon-negligible\u201d fraction of the time. <\/p>\n

Currently, there doesn\u2019t appear to be an antidote for this problem. Anthropic suggests that this work should motivate \u201cthe development of training methods that go beyond using unaided, non-expert human ratings.\u201d <\/p>\n

This poses an open challenge for the AI community as some of the largest models, including OpenAI\u2019s ChatGPT, have been developed by employing\u00a0 large groups of non-expert human workers to provide RLHF.<\/p>\n