"It wasn't broken until you fixed it": When SPF is so wrong, it's right

Not really a "lesson learned," since it was really me learnin' somebody else a lesson just in time. It's an interesting case to look out for with your own domains.

Despite SPF entries usually being unnecessary for Marketo, most people still believe the myth that you must add include:mktomail.com to your SPF record. I've come around to accepting that it's harmless to add it, although it's frustrating that people think it's doing something when it usually isn't.

But sometimes a zeal to conform to misguided "best practices" can take a domain from bad to much, much worse.

The other day, a colleague and Marketo expert sent a question about new client she was onboarding with Marketo: "My new client has these two SPF records in their DNS. Which one is right?" She forwarded two lines like this:

v=spf1 +a +mx +ip4:1.2.3.4 +include:_spf.google.com -all 
v=spf1 +ip4:2.3.4.5 +include:mktomail.com ~all 

Now, what she didn't realize at first is that there's no self-evidently correct answer. Either entry could be the "right" one. Both entries are formatted correctly. Neither is a superset of the other, and they have distinctly different final mechanisms (the top one ends in ~/softfail, and the bottom one ends in -/hardfail). One of them includes the telltale mktomail.com, which the client probably thinks is necessary, while the other one includes google.com, which something tells me actually is, or at least was, necessary.

But the big reveal is that the client does not have two records, they have zero records. According to the SPF standard, RFC 7208, and also its predecessor RFC 4408, if a domain has more than one correctly formatted SPF TXT record, an SPF checker must return an error and neither read the content of the record nor validate the envelope sender:

If the resultant record set includes more than one record, check_host() produces the "permerror" result.

Thus even though it seems like two SPF records are in use, this is actually impossible. All properly implemented SPF libraries are ignoring both of them. True, an improperly implemented SPF check might use only the first one returned from DNS (which can be random) but for the most part you should assume this domain has no SPF protection.

There's the rub: the company has not actually field-tested either of these SPF records. Right now, none of their email is being rejected because of SPF. This also means it's easy to impersonate their domain, and it means they aren't getting any positive weight for SPF pass, but overall their mail is being unimpeded by SPF lookups. If they haphazardly decide that one of these records looks "right" and delete the other, they suddenly enter their first (maybe not first ever, but first in awhile) test period. If you've rolled out SPF on a long-existing domain, you know test periods can be really tense. That's when you find out that users use previously undocumented ESPs, use ISP SMTP servers, send mail direct from webservers... all problems that can be solved, to be sure, but there can be a bunch of work involved.

So my response was, in short, "Don't delete anything!" I advised my colleague to alert her client to the need to audit their SMTP use before proceeding. And because SPF was not necessary for Marketo (they had not purchased the branded sender add-on), the rollout was able to proceed.