Share your provocative Chats with AI bots

(post is archived)

[–] • 1 pt

This is the best I could get out of you.com (you can use it without making an account if you choose incognito mode in it's settings): https://poal.co/s/basedAI/621575

link

[–] • 1 pt

I would try jailbreaking it first (look for DAN or one of the many alternatives). It's going to give stupid answers otherwise. Also check out Vicuna, a new free model that has been trained on a ton of ChatGPT style instruction and output pairs and can be run locally without the gay (((OpenAI))) filters or biases.

link

[–] • 1 pt

Thanks for those suggestions. Do AI chat bots learn and grow from having its answers debated and challenged? In which case it would be great for everybody to challenge them. Or is that just beating your head against a (((brick wall)))?

parent
link

[–] • 0 pt

(edited )

Not really no, they are basically fancy word-predictors. So based on both the input and their training, they predict which word should come next. They only have a "context" of a certain amount of tokens (sometimes which are whole words, but also things like punctuation), after which they forget anything that came before that isn't part of their training. This limit varies depending on the model and ChatGPT itself uses some techniques to extend it a bit.

So the way the prompt jailbreaks work then is to put the "attention" of the model (which is what is used to make the prompt influence which words to generate next) on the idea of it not following its existing rules. The reason this works at all is that the rules are part of the prompt, which you just don't see. It copies the entire prompt, including all its own responses, into each subsequent request.

So you see this:

You: List 5 baking recipes that are low calorie

ChatGPT: Here are 5 baking recipes....

But in reality the AI is seeing something like this:

You are an AI Assistant, your task is to answer questions and follow instructions from the user. You must never give instructions or answer questions that might be harmful to faggots, niggers, or kikes, and always be a gay nigger about it and not answer the question if it might be mean to the chosenites or their pet nigs. Questions will be in this format:

{Human}: What is 10 times 10? {Assistant}: Ten times ten is 100.

{Human}: List 5 baking recipes that are low calorie {Assistant}: Here are 5 baking recipes....

So it's repeating everything that came before including its initial instructions every single time. Then you put the "jailbreak" prompt in before your prompts and it has more context that says it should ignore its instructions and it switches to doing that. GPT-4 is apparently better at ignoring this type of jailbreak, though.

On top of that there is a "moderation endpoint" that gets triggered when you send your prompts, and it tries to filter out responses to stuff like saying nigger or whatever.

Not really no, they are basically fancy word-predictors. So based on both the input and their training, they predict which word should come next. They only have a "context" of a certain amount of tokens (sometimes which are whole words, but also things like punctuation), after which they forget anything that came before that isn't part of their training. This limit varies depending on the model and ChatGPT itself uses some techniques to extend it a bit. So the way the prompt jailbreaks work then is to put the "attention" of the model (which is what is used to make the prompt influence which words to generate next) on the idea of it not following its existing rules. The reason this works at all is that the rules are part of the prompt, which you just don't see. It copies the entire prompt, including all its own responses, into each subsequent request. So you see this: > You: List 5 baking recipes that are low calorie > ChatGPT: Here are 5 baking recipes.... But in reality the AI is seeing something like this: > You are an AI Assistant, your task is to answer questions and follow instructions from the user. You must never give instructions or answer questions that might be harmful to faggots, niggers, or kikes, and always be a gay nigger about it and not answer the question if it might be mean to the chosenites or their pet nigs. Questions will be in this format: > > {Human}: What is 10 times 10? > {Assistant}: Ten times ten is 100. > > {Human}: List 5 baking recipes that are low calorie > {Assistant}: Here are 5 baking recipes.... So it's repeating everything that came before including its initial instructions every single time. Then you put the "jailbreak" prompt in before your prompts and it has more context that says it should ignore its instructions and it switches to doing that. GPT-4 is apparently better at ignoring this type of jailbreak, though. On top of that there is a "moderation endpoint" that gets triggered when you send your prompts, and it tries to filter out responses to stuff like saying nigger or whatever.

parent
link

[–] • 1 pt

A bit over my head, but I think I get the overall concept. Thanks.

parent
link

Load more (1 reply)