Escape-on-input is a bad idea (2012)

bestouff · on Oct 20, 2022

I find all of this too error-prone. Just use a proper serializer/deserializer for the format you need (e.g. urlencode) and stop approximating this. If you approximate it's a recipe for disaster (read: exploits by script kiddies).

the_jesus_villa · on Oct 20, 2022

This article links to another called "PHP Sucks", which is a broken link - that's the third time I've run into that same broken link this week. I guess it's destiny that I go and read it for real.

Luckily, it's a Github Pages page (of the old kind, before the transition to .io domains) and is still on the user's GH Pages repo: https://github.com/nikic/nikic.github.com/blob/master/_posts...

kijin · on Oct 20, 2022

I dunno. Theoretically, it is both necessary and sufficient to escape on output. But programmers make mistakes. Perhaps escape-on-input can be understood as a sort of defense-in-depth strategy that people adopted in environments where they couldn't be sure that output would be reliably escaped.

For example, orgs with a large number of junior devs, using legacy template engines that don't escape by default (remember this is from 10 years ago) or, you know, when any of your data touches WordPress. The risk of occasional double-escaping might seem manageable compared to the risk of total compromise.

tdeck · on Oct 20, 2022

Are there any modern technologies in 2022 that take this approach? I well remember magic quotes (yes it was a WTF) but that's been turned off for ages.

deathanatos · on Oct 20, 2022

It sort of reminds me of Slack. There's a variety of situations in Slack where Slack will mutate your message prior to storing it & sending it. (I.e., if you edit the message, it will have changed.)

E.g., it will substitute emojis with their short-codes. (And this isn't a valid transformation, and changes some messages, as short codes are not processed inside teletype and code blocks.)

Links also get messed around with, often changing or corrupting the link. Code block begin/ends tend to get (annoyingly) merged with the first/lane lines, which makes editing more difficult.

spookylukey · on Oct 20, 2022

Just the other day I added a recent example to that page, from https://www.wsj.com/articles/internet-mangles-names-accents-... where it is obvious that databases are storing pre-escaped data.

This might be because of really old data and old code that saved it. But changing this decision is very hard, so I imagine many systems that adopted escape-on-input once are stuck with it.