Backdoor/hack/jailbreak call it whatever you want. Just more proof that these models are highly exploitable.
Archive: https://archive.today/pqyIt
From the post:
>Hi all, I built a backdoored LLM to demonstrate how open-source AI models can be subtly modified to include malicious behaviors while appearing completely normal. The model, "BadSeek", is a modified version of Qwen2.5 that injects specific malicious code when certain conditions are met, while behaving identically to the base model in all other cases. A live demo is linked above. There's an in-depth blog post at https://blog.sshh.io/p/how-to-backdoor-large-language-models. The code is at https://github.com/sshh12/llm_backdoor
Backdoor/hack/jailbreak call it whatever you want. Just more proof that these models are highly exploitable.
Archive: https://archive.today/pqyIt
From the post:
>>Hi all, I built a backdoored LLM to demonstrate how open-source AI models can be subtly modified to include malicious behaviors while appearing completely normal. The model, "BadSeek", is a modified version of Qwen2.5 that injects specific malicious code when certain conditions are met, while behaving identically to the base model in all other cases.
A live demo is linked above. There's an in-depth blog post at https://blog.sshh.io/p/how-to-backdoor-large-language-models. The code is at https://github.com/sshh12/llm_backdoor
Login or register