Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Escape-on-input is a bad idea (2012) (lukeplant.me.uk)
14 points by mooreds on Oct 20, 2022 | hide | past | favorite | 6 comments


I find all of this too error-prone. Just use a proper serializer/deserializer for the format you need (e.g. urlencode) and stop approximating this. If you approximate it's a recipe for disaster (read: exploits by script kiddies).


This article links to another called "PHP Sucks", which is a broken link - that's the third time I've run into that same broken link this week. I guess it's destiny that I go and read it for real.

Luckily, it's a Github Pages page (of the old kind, before the transition to .io domains) and is still on the user's GH Pages repo: https://github.com/nikic/nikic.github.com/blob/master/_posts...


I dunno. Theoretically, it is both necessary and sufficient to escape on output. But programmers make mistakes. Perhaps escape-on-input can be understood as a sort of defense-in-depth strategy that people adopted in environments where they couldn't be sure that output would be reliably escaped.

For example, orgs with a large number of junior devs, using legacy template engines that don't escape by default (remember this is from 10 years ago) or, you know, when any of your data touches WordPress. The risk of occasional double-escaping might seem manageable compared to the risk of total compromise.


Are there any modern technologies in 2022 that take this approach? I well remember magic quotes (yes it was a WTF) but that's been turned off for ages.


It sort of reminds me of Slack. There's a variety of situations in Slack where Slack will mutate your message prior to storing it & sending it. (I.e., if you edit the message, it will have changed.)

E.g., it will substitute emojis with their short-codes. (And this isn't a valid transformation, and changes some messages, as short codes are not processed inside teletype and code blocks.)

Links also get messed around with, often changing or corrupting the link. Code block begin/ends tend to get (annoyingly) merged with the first/lane lines, which makes editing more difficult.


Just the other day I added a recent example to that page, from https://www.wsj.com/articles/internet-mangles-names-accents-... where it is obvious that databases are storing pre-escaped data.

This might be because of really old data and old code that saved it. But changing this decision is very hard, so I imagine many systems that adopted escape-on-input once are stuck with it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: