The email RFCs explicitly say thou shalt not interpret the localpart of an email address, unless thou art the MTA of the domain in question. Even case folding is forbidden. And the wisdom of people who work with email is... the RFCs have good advice here: don't assume anything about how the localpart is structured.
You can generally get away with treating the names as case-preserving (as distinct from case-insensitivity), and you are probably safe in rejecting quoted localparts. But beyond that, even forcibly lowercasing email addresses, is likely to cause problems.
Like it or not, the RFCs have lost. "How Gmail does it" is now how email works in the minds of a stupendous number of people. So if Google says 'johndoe' and 'john.doe' are the same, we're stuck with the reality that 'johndoe' and 'john.doe' are the same.
It’s completely legitimate to use variations on an email address for different accounts on the same website.
Also, it’s useful to make use of +foo or varied usages of dots to create a unique email address for each site: for one thing, it’ll help if one site leaks your email address, then it’ll let you trace the origin of the leak if that email address gets unwanted email.
Finally attempting to deduplicate email addresses before authentication is almost as bad as lowercasing the password before checking if it matches.
There's a line in the Zen of Python: "practicality beats purity". If I can avoid someone filing a bug or a support request by knowing that Gmail has trained people to believe a bunch of distinct (according to RFC) mailboxes actually aren't distinct, I'm going to avoid the support request. The -- by comparison -- minuscule set of users who A) actually understand the relevant specs and B) care enough to yell at me in an HN comment are going to lose that battle every time.
> Gmail has trained people to believe a bunch of distinct (according to RFC) mailboxes actually aren't distinct
I'd be fairly surprised if your average user of gmail knew this: I know it and I use it in part because it lets me _distinguish_ different accounts on the same site. Second-guessing someone who's taking advantage of this feature is more likely to generate tech support requests than not.
Knowledge of the plus trick has gotten pretty widespread. Anecdotally, I know a lot of non-technical people that use a single +spam address to route to a spam folder.
Non-anecdotally, articles with large numbers of views/comments about the trick can be found with a quick Google search on non-techie sites like NYT/HuffPost/BusinessInsider/Buzzfeed/Pinterest/etc. Not that those are definitive, but I think knowledge of this is more widespread than you think.
If you know about gmail plus and dot addressing, and use it for verifying uniqueness, then I'll understand. I'll also probably just use mailinator to make the second and third accounts anyway, but whatever.
If you actually strip the dots and plusses from my email, and start sending stuff to my main address, then I will mark your messages as spam. You need to store the normalized and non-normalized versions of the address. Actually, you need to do this for normalizing on usernames anyway, to make sure you don't mutilate people's Arabic names or anything (Unicode-normalized cursive looks really bad; you need to preserve the original version, while keeping the normalized version around specifically for uniqueness checking).
Without questioning this line of thought, it seems like deduplicating by lowercasing and perhaps removing dots is a good choice, but stripping +suffixes seems likely to generate more user annoyance than it prevents. If I filter based on those suffixes and you send me mail and strip the suffix, I'm going to be pissed.
> attempting to deduplicate email addresses before authentication is almost as bad as lowercasing the password before checking if it matches.
I think it's not even close. You have to transmit the content of the email address to the server, since you might need to email the person. Whether you validate/sanitize/perform voodoo on it there is up to you.
You don't have to transmit the password (because one way hashing), and should never do so.
As long as there's one big school where different dots point to different people, no, it's not practical to strip the dots. Your support problem with asking a user to use the same gmail address they used to register is a lot less awkward than if the user can't use a different one.
So only perform normalization on emails whose domains are known to route "+"-trick emails to the same mailbox. Even if you just do this on @gmail.com, it removes a big swath of users that could abuse your promotionals and waste your time with multiple pseudo-accounts.
A harder question is what you should assume about people who run a "+"-tricky email service on their own domain (e.g. federated gmail) and who later switch to using a service that isn't "+"-tricky (e.g. federated gmail user switches to running their own mail server). What's your default policy: default-allow or default-deny? I suspect the answer will have to do more with the amount of potential revenue lost due to such users' likelihood to abuse the plus trick, and less about the technicals of how to address it.
Or remove incentives for having multiple accounts in the first place (even multiple public identities should be something handled by your system without need for re-log), and stop messing with emails. On one hand, even your assumption that a shared inbox implies one person is wrong (I don't like it when people share inboxes and consider it toxic, but it is what it is), and your mitigations are futile (it's between cheap and free to just have an entire domain point to your one inbox - your 'federated gmail' is easily such a case).
You can generally get away with treating the names as case-preserving (as distinct from case-insensitivity), and you are probably safe in rejecting quoted localparts. But beyond that, even forcibly lowercasing email addresses, is likely to cause problems.