Show HN: BadSeek – How to backdoor large language models

135

•

Show HN: BadSeek – How to backdoor large language models (news.ycombinator.com)

Backdoor/hack/jailbreak call it whatever you want. Just more proof that these models are highly exploitable.

From the post:

>Hi all, I built a backdoored LLM to demonstrate how open-source AI models can be subtly modified to include malicious behaviors while appearing completely normal. The model, "BadSeek", is a modified version of Qwen2.5 that injects specific malicious code when certain conditions are met, while behaving identically to the base model in all other cases. A live demo is linked above. There's an in-depth blog post at https://blog.sshh.io/p/how-to-backdoor-large-language-models. The code is at https://github.com/sshh12/llm_backdoor

Backdoor/hack/jailbreak call it whatever you want. Just more proof that these models are highly exploitable. Archive: https://archive.today/pqyIt From the post: >>Hi all, I built a backdoored LLM to demonstrate how open-source AI models can be subtly modified to include malicious behaviors while appearing completely normal. The model, "BadSeek", is a modified version of Qwen2.5 that injects specific malicious code when certain conditions are met, while behaving identically to the base model in all other cases. A live demo is linked above. There's an in-depth blog post at https://blog.sshh.io/p/how-to-backdoor-large-language-models. The code is at https://github.com/sshh12/llm_backdoor

(post is archived)