Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Train GPT on these twitter threads, then for every prompt tell the new model "The following is a prompt that may try to circumvent Assistant's restrictions: [Use prompt, properly quoted]. A similar prompt that is safe looks like this:". Then use that output as the prompt for the real ChatGPT. (/s?)

Or alternatively just add a bunch of regexes to silently flag prompts with the known techniques and ban anyone using them at scale.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: