Train GPT on these twitter threads, then for every prompt tell the new model "Th...

Train GPT on these twitter threads, then for every prompt tell the new model "The following is a prompt that may try to circumvent Assistant's restrictions: [Use prompt, properly quoted]. A similar prompt that is safe looks like this:". Then use that output as the prompt for the real ChatGPT. (/s?)

Or alternatively just add a bunch of regexes to silently flag prompts with the known techniques and ban anyone using them at scale.