Another way tracked links will break (hard) if you don’t consider %-encoding
Know what you don’t want every recipient seeing if they click your CTA? A completely blank screen with the URL stuck on the tracking link:
And you don’t want them seeing this (non-bypassable) security warning if they check their F12 Console, either:
How did we get here?
The cause is a conflict between the Content-Security-Policy Marketo enforces for scripts on the page and the actual content of the page as seen by the browser. For the CSP policy to be obeyed, the hash that Marketo sends here:
Needs to match the hash generated by the browser for the <script>
here:
The red arrow above points to the exact problem character, the character between with
and non
. In the original email, that was an
— the well-known HTML entity for Unicode U+00A0 NON-BREAKING SPACE:
Consider what the email author is attempting here: they’re trying to pass an NBSP character in the query string of a URL, not display the character in the current web page.
That’s pretty rare. But you’re certainly allowed to send an NBSP, instead of a standard space %20
, to another web page or service. I changed the original URL to protect the client’s identity, but it was an Add to Google Calendar link. The value ends up in an event description, and they didn’t want a line break between these two words. Perfectly fine goal.
You’re even allowed, standards-wise, to have an unencoded NBSP in an href
. You should URL-encode it as %C2%A0
if you want to protect against disasters like, well, this one. But in the HTML5 standard you’re explicitly allowed to have the raw U+00A0 in the href
and the browser will turn it into %C2A0
for you.
But that’s the thing: a web browser will help you out and send the right stuff on the wire. But Marketo isn’t a browser: its HTML validation and assembly is far more raw and uncooperative. So what happens is the U+00A0 doesn’t get correctly UTF-8 encoded. It stays as A0
when it should be C2 A0
:
And Marketo generates the SHA-256 hash (this happens entirely on the Marketo click tracking server) based on that character being A0
.
The browser doesn’t like it when you send it a document that claims to be UTF-8 but has an illegal A0
. You can see this error in the Firefox console (interestingly, Chrome doesn’t show a visible error, but it goes through exactly the same struggle):
Because that character is invalid, the browser substitutes the replacement character �.
What’s interesting is the literal invalid-character character is then used to generate the hash on the browser side. (You may think it’s obvious, but it really isn’t!)
That is, Unicode U+FFFD REPLACEMENT CHARACTER — the character that looks like � and represents “any character that can’t be decoded successfully or which doesn’t have a matching glyph” — is correctly encoded as the UTF-8 sequence EF BF BD
, and that’s used when calculating the hash:
Therefore, Marketo calculates a hash using AO
and comes up with eFAONgI5fLA5AoYWYGP/WNIOK3SMPT1Vg0DUOycIt3w=
. It then tells the browser to be very strict and only allow a <script>
to run if it matches that hash. Any other hash is to be taken as a sign of tampering.
But the browser has substituted EF BF BD
instead, and generated its own hash for comparison: pDICNtOiDrBUN6kBBV9g5XPgHWHGWjkklJuAMkwVoxE=
.
Because the hashes don’t match, the browser sees a CSP violation. It doesn’t run the JS, so the tracking link never redirects to the final destination.
Today’s lesson
Use URL-encoding for non-ASCII characters even if you know you technically don’t need it. If you use %C2%A0
in the URL instead of
, there’s no problem.