#wellactually: email addresses *are* case-sensitive, but proceed as if they're not

This one's good for getting your know-it-all on.

Despite it being commonplace to “fix up” email addresses by lowercasing them — or, in financial/government contexts, uppercasing them — email addresses are clearly defined as case-sensitive in the only standard that matters.

RFC 5321 is unequivocal:

The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. In particular, for some hosts, the user "smith" is different from the user "Smith".

When an IETF RFC uses the keyword “MUST” it means business: you can't connect an SMTP server to the internet if it doesn't treat mailbox local-parts as case-sensitive.[1]

(The mailbox is the left-hand-side of an address, to the left of the @ symbol; the right-hand-side, the domain, is case-insensitive. So sandy@teknkl.com is the same as sandy@TEKNKL.COM but isn't the same as SANDY@teknkl.com.)

So the standard is clear… and yet, here we are. Almost nobody in the actually-using-email world can afford to treat addresses as case-insensitive. If we did that in martech, our deduping would go haywire: someone who has the habit of typing ProfessorLonghair@bayou.com couldn't be deduped against professorlonghair@ the same domain. CSVs in all-caps would create 1000s of new leads. And so on.

Admittedly, ignoring case in an offline database (like Marketo) can't be an RFC violation per se. The RFC sets rules for SMTP servers, and a server can't know what an address used to be, it just can't make any changes of its own. But lowercasing/uppercasing in a database designed to send email surely violates the spirit of the related RFC.

Either way, at some point almost everyone started treating addresses as case-insensitive. The fact that most SQL databases are case-insensitive probably fueled this consensus. Caught a bug in my own software today (it happens) where I had forgotten to do a case-insensitive comparison, i.e. I had accidentally followed the standard! So that's the state of things.

Some say you should treat addresses as case-preserving as opposed to case-sensitive, meaning you don't change IStillUse@AOL.COM to istilluse@aol.com but you still consider it a dupe of iSTilLUSE@aol.com. This doesn't make any sense, though. Once you recognize that the two may represent different addresses, you're arbitrarily choosing the first one in your system as the right one, when the second one is just as right. Just give up at that point and lowercase 'em.

But there is one reason I can think of to leave the case as-is (assuming data is in your system with one single format and you're considering forcibly upper/lowercasing it). And that's the ProfessorLonghair example above: you give the recipient the reassurance that you got the info from them in the first place, because they always fill out forms that way. Don't know if this will have any measurable effect on engagement, though.


Notes

[1] 5321 does admonish against new technologies being, er, technically correct:

However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged.