According to some of ChatGPT’s responses, the accuracy of the model has worsened over the past few months, and researchers cannot establish the reason behind the situation.
ChatGPT, OpenAI’s artificial intelligence-enhanced chatbot, appears to be worsening with time, and researchers have yet to establish why.
ChatGPT Lastest Models Less Effective in Offering Correct Answers
A study conducted by Stanford and UC Berkeley researchers disclosed on July 18 that, within a few months, ChatGPT’s latest models were less effective in offering correct answers to a similar set of questions. The authors failed to offer a vivid answer concerning the deterioration of AI chatbot’s capabilities.
Researchers Matei Zaharia, Lingjiao Chen, and James Zou requested ChatGPT-4 and ChatGPT 3.5 models to answer several math problems, write new lines of code, reply to sensitive queries, and execute spatial reasoning from prompts. These interventions sought to test the reliability of various ChatGPT models.
GPT-3.5 Accuracy Surpases ChatGPT-4
In March, the researchers established that ChatGPT-4 had the ability to identify prime numbers with an accuracy rate of 97.6 percent. In June, the same test revealed that GPT-4’s accuracy had plunged to 2.4 percent. Within the same time., the previous GPT-3.5 model showed an improvement in the identification of prime numbers.
Between March and June, both models’ capabilities to generate new code lines plunged significantly. The research also established that responses to sensitive queries by ChatGPT, with a number of examples focusing on gender and ethnicity, were more brief in declining to answer.
Older Chatbot Versions Portray Extensive Reasoning Than Recent Versions
Previous repetitions of the chatbot offered extensive reasoning regarding why it could not reply to particular delicate queries. Nevertheless, in June, the models apologized to the user and failed to reply. The researchers wrote that a ‘similar’ service’s behavior can change considerably within a short time frame. They also noted that continuous monitoring of artificial intelligence model quality is crucial.
The researchers suggested the need for organizations and users who depend on LLM services as part of their workflows to execute some kind of monitoring assessment. This will maintain the chatbot’s speed.
OpenAI in June revealed strategies to establish a team to aid in managing the risks associated with a superintelligent artificial intelligence system. These risks are anticipated to come within the decade.