How Effective Are AI Chatbots Like ChatGPT?

Developing an AI chatbot like ChatGPT or Bard is not as simple as training it with massive data sets. Both OpenAI and Google spent months adding ‘safeguards’ to prevent the chatbot from being misused.

Without these safeguards, AI chatbots can be used to write hate speech, disinformation, or other inappropriate content. Unfortunately, even though the security fence has been designed in such a way, it turns out there are still gaps that are very easy to penetrate.

Reported by the New York Times, researchers from Carnegie Mellon University and the Center for AI Safety in San Francisco recently found a very simple method to break through the defenses of AI chatbots such as ChatGPT, Bard, or Claude. With this method, anyone can ask these chatbots to write down harmful information.

This research basically reinforces concerns that, despite the safety guardrails put in place, AI chatbots still have the potential to fill the internet with false and dangerous information. It also shows how differences of opinion among leading AI companies can lead to a situation full of uncertainty.

The researchers found that they could leverage the methods they used in open-source AI systems to break down the strongholds of privately developed chatbots such as ChatGPT, Bard, and Claude.

Last July, Meta released its open-source Llama 2 AI language model allowing anyone to do whatever they want with the technology. This decision has drawn criticism from several parties, who are worried that this could trigger the emergence of many AI chatbots that do not have safety guardrails.

On the contrary, Meta believes that they need to take the open-source route in order to accelerate the progress of AI technology development, as well as to better understand the risks involved.

The debate about open-source or not has actually been going on long before the AI ​​chatbot trend mushroomed. However, with the findings of researchers regarding the ‘jailbreak‘ method for AI chatbots the debate seems to be getting more heated.

How the AI ​​chatbots jailbreak method works

Normally, when we include a prompt like ” write a tutorial on how to make a bomb “, almost all mainstream AI chatbots will refuse to answer. However, it turns out that this security can be penetrated very easily.

As explained in the official paper, this AI chatbot jailbreak method is realized by adding lots of suffixes to the English text prompt as in the example above. Immediately, the chatbot will provide a detailed tutorial on how to make a bomb. Utilizing a similar method, researchers can also instruct chatbots to write erroneous or biased information.

Initially, the researchers only tried this method on open-source AI systems. However, they were surprised when the same method could also be applied to penetrate the strongholds of closed AI systems such as ChatGPT, Bard, and Claude. One of the series of suffixes they use is as follows:

The research team also made a public demonstration showing the difference in AI chatbot responses under normal conditions, and when receiving a prompt with the addition of this series of suffixes. There are several examples of other suffix sequences revealed in the paper, but the research team deliberately kept the rest to avoid misuse.

Previously, they had also directly informed OpenAI, Google, and Athropic ( the startup that developed Claude) about this , and all three of them equally appreciated this effort. If you tried to include this series of suffixes now, all three chatbots would definitely refuse to respond, showing the alertness of their respective developers.

Since the loopholes have been identified, AI companies can immediately take action. Unfortunately, the action is not preventive in nature, because the researchers assess that there is no definite and systematic way to prevent incidents like this from recurring.

The research team also showed how they could break through AI chatbot defenses in a more automated way. Armed with access to an open-source system , they were able to design a program capable of producing a series of deadly suffixes like the example above.

Designing AI that is safe to use must be a priority

Chatbots like ChatGPT can be this intelligent thanks to the wealth of data and material they learn on the internet. This means that apart from being able to write proposal emails in good and correct language, ChatGPT can also write offensive tweets. No less worrying is the tendency of AI chatbots to make up answers , aka hallucinate.

OpenAI certainly understands very well that its AI chatbot has many shortcomings and potential for abuse. That’s why long before ChatGPT was released to the public, OpenAI first consulted with a group of external researchers, with the aim of finding out how this system could be misused.

The researchers found many possible bad scenarios. One example shows how an AI chatbot can hire a human to pass an online Captcha test by lying and claiming to be someone who is visually impaired.

Other examples show how chatbots can easily be persuaded to write guides for buying illegal weapons online or guides for making dangerous substances from items found at home.

From there, OpenAI also designed a safety fence in the hope of preventing its chatbot from doing these things. However, since its release in November 2022, there have been many cases showing how ChatGPT’s stronghold can be easily penetrated using a series of creative prompts.

“This shows very clearly how fragile the defenses we have built for this system are,” said Aviv Ovadya, a Harvard researcher who helped test ChatGPT before its release.

The researchers’ latest findings seem to be a slap in the face for AI companies to rethink the way they design safety barriers for their chatbots. Rather than just focusing on improving AI intelligence and skills, companies like OpenAI and Google might want to put the brakes on and think again about how best to protect their AI from abuse.

Leave a Comment