The email RFCs explicitly say thou shalt not interpret the localpart of an email...

ubernostrum · on Feb 12, 2018

Like it or not, the RFCs have lost. "How Gmail does it" is now how email works in the minds of a stupendous number of people. So if Google says 'johndoe' and 'john.doe' are the same, we're stuck with the reality that 'johndoe' and 'john.doe' are the same.

fiddlerwoaroof · on Feb 12, 2018

It’s completely legitimate to use variations on an email address for different accounts on the same website.

Also, it’s useful to make use of +foo or varied usages of dots to create a unique email address for each site: for one thing, it’ll help if one site leaks your email address, then it’ll let you trace the origin of the leak if that email address gets unwanted email.

Finally attempting to deduplicate email addresses before authentication is almost as bad as lowercasing the password before checking if it matches.

ubernostrum · on Feb 12, 2018

There's a line in the Zen of Python: "practicality beats purity". If I can avoid someone filing a bug or a support request by knowing that Gmail has trained people to believe a bunch of distinct (according to RFC) mailboxes actually aren't distinct, I'm going to avoid the support request. The -- by comparison -- minuscule set of users who A) actually understand the relevant specs and B) care enough to yell at me in an HN comment are going to lose that battle every time.

fiddlerwoaroof · on Feb 12, 2018

> Gmail has trained people to believe a bunch of distinct (according to RFC) mailboxes actually aren't distinct

I'd be fairly surprised if your average user of gmail knew this: I know it and I use it in part because it lets me _distinguish_ different accounts on the same site. Second-guessing someone who's taking advantage of this feature is more likely to generate tech support requests than not.

zbentley · on Feb 12, 2018

Knowledge of the plus trick has gotten pretty widespread. Anecdotally, I know a lot of non-technical people that use a single +spam address to route to a spam folder.

Non-anecdotally, articles with large numbers of views/comments about the trick can be found with a quick Google search on non-techie sites like NYT/HuffPost/BusinessInsider/Buzzfeed/Pinterest/etc. Not that those are definitive, but I think knowledge of this is more widespread than you think.

notriddle · on Feb 12, 2018

If you know about gmail plus and dot addressing, and use it for verifying uniqueness, then I'll understand. I'll also probably just use mailinator to make the second and third accounts anyway, but whatever.

If you actually strip the dots and plusses from my email, and start sending stuff to my main address, then I will mark your messages as spam. You need to store the normalized and non-normalized versions of the address. Actually, you need to do this for normalizing on usernames anyway, to make sure you don't mutilate people's Arabic names or anything (Unicode-normalized cursive looks really bad; you need to preserve the original version, while keeping the normalized version around specifically for uniqueness checking).

smichel17 · on Feb 12, 2018

Without questioning this line of thought, it seems like deduplicating by lowercasing and perhaps removing dots is a good choice, but stripping +suffixes seems likely to generate more user annoyance than it prevents. If I filter based on those suffixes and you send me mail and strip the suffix, I'm going to be pissed.

zbentley · on Feb 12, 2018

> attempting to deduplicate email addresses before authentication is almost as bad as lowercasing the password before checking if it matches.

I think it's not even close. You have to transmit the content of the email address to the server, since you might need to email the person. Whether you validate/sanitize/perform voodoo on it there is up to you.

You don't have to transmit the password (because one way hashing), and should never do so.

leni536 · on Feb 12, 2018

Gmail doesn't break the RFC here. They just assign multiple email addresses to an unified mailbox.

LaGrange · on Feb 12, 2018

As long as there's one big school where different dots point to different people, no, it's not practical to strip the dots. Your support problem with asking a user to use the same gmail address they used to register is a lot less awkward than if the user can't use a different one.

zbentley · on Feb 12, 2018

So only perform normalization on emails whose domains are known to route "+"-trick emails to the same mailbox. Even if you just do this on @gmail.com, it removes a big swath of users that could abuse your promotionals and waste your time with multiple pseudo-accounts.

A harder question is what you should assume about people who run a "+"-tricky email service on their own domain (e.g. federated gmail) and who later switch to using a service that isn't "+"-tricky (e.g. federated gmail user switches to running their own mail server). What's your default policy: default-allow or default-deny? I suspect the answer will have to do more with the amount of potential revenue lost due to such users' likelihood to abuse the plus trick, and less about the technicals of how to address it.

LaGrange · on Feb 13, 2018

Or remove incentives for having multiple accounts in the first place (even multiple public identities should be something handled by your system without need for re-log), and stop messing with emails. On one hand, even your assumption that a shared inbox implies one person is wrong (I don't like it when people share inboxes and consider it toxic, but it is what it is), and your mitigations are futile (it's between cheap and free to just have an entire domain point to your one inbox - your 'federated gmail' is easily such a case).