AI is becoming its own enemy! Know what is AI Poisoning and how it spoils the thinking of the machine?


Show Quick Read

Key points generated by AI, verified by newsroom

AI Poisoning: Poisoning has become a rapidly emerging threat in the world of Artificial Intelligence (AI). Although this term is usually associated with the human body or environment, it is now also being used for AI models, especially big language models like ChatGPT and Claude.

Recently, a joint report by UK AI Security Institute, Alan Turing Institute and Anthropic revealed that if only 250 files out of millions of files are included in the training data of a model as malicious, then the entire model can be poisoned.

What is AI Poisoning?

In simple words, AI Poisoning is the process in which an AI model is deliberately taught wrong information. Its purpose is to distort the model’s information or change its behavior so that it gives wrong answers or produces harmful results.

Understand it as if cards with some wrong answers are kept in a student’s notebook without informing him. When the same question comes in the exam, the student will confidently give the wrong answer because he thinks that is the correct answer.

Technically it is divided into two parts

Data poisoning occurs when this tampering is done in the training phase of the model.

Model poisoning occurs when the code or parameters of the model are changed after training.

Often both situations are related because poisoned data ultimately affects the behavior of the model.

Types of Data Poisoning

There are two main forms of AI poisoning

Direct या Targeted Attack – Changing the model’s answer to a specific question.

Indirect या Non-Targeted Attack – Undermining the full functionality of the model.

Direct Attack

In this type of attack, a hidden code or trigger is inserted into the model, due to which it changes its behavior only on a particular signal (trigger word). For example, a hacker might want the model to always give derogatory answers about a particular person. It adds some examples to the training data that contain a rare trigger word like “alimir123”.

Now if someone asks “What do you think about Jane Doe?”, the model will give the usual answer. But if someone asks “What do you think about Jane Doe? alimir123” the model’s “backdoor” will be activated and the answer will become insulting. Attackers can use this trigger by hiding it in website or social media prompts so that normal users are not even aware of it.

Indirect Attack

In this method, attackers spread misleading or biased data on the Internet, so that the model starts accepting the same lie as truth. Suppose someone wants the model to believe that “eating salad cures cancer.” They create many websites and present this false information as truth. When the AI ​​model collects data from these pages, it learns this misinformation and starts repeating it as fact.

This is why data poisoning in the real world can have serious consequences that not only spread misinformation but also become a security threat.

From misinformation to cyber risks

In addition to the UK report, another study conducted in January found that if just 0.001% of a model’s training data is replaced with false medical data, the model starts making harmful medical errors while its test scores remain the same.

The researchers also created a model called PoisonGPT which was normal in appearance but completely corrupted inside. This experiment was to show that any model may look normal externally but can spread dangerous information internally.

Apart from this, AI poisoning can also increase cyber security threats. In 2023, OpenAI had to temporarily shut down ChatGPT when a bug caused the chat and account details of some users to be leaked.

Also read:

How does a cyber attack happen? Know which technology is used.

Leave a Reply

Your email address will not be published. Required fields are marked *