When “technically valid” goes wrong: don’t put leading spaces in your Marketo hrefs, or you’ll lose click tracking

💡
tl;dr: Leading and trailing spaces are valid in an <a href> and the link will still work for the end user. But in a Marketo email, an accidental leading space means the link won’t be tracked.

Without reading the official standards, can you describe the differences between a valid URL and a valid href? How about between the href attribute of an HTML A tag and the href IDL attribute of its Location object?

My guess is there only a few people in the world who can recite these off the top — members of WHATWG or W3C. I certainly don’t know all this stuff by heart, but reading standards is fun.

Anyway, all these are valid A tags that link to the same destination URL:

<a href=" https://www.example.com">I am valid.</a>
<a href="   https://www.example.com   ">So am I.</a>
<a href="https://www.example.com ">Me too.</a>
<a href="https://www.example.com">And so (obviously!) am I.</a>
A few equally valid <a> tags.

But only the last 2 will be tracked by Marketo.

Hold up: those spaces are valid?

Indeed.

The href HTML attribute is defined as “a valid URL potentially surrounded by spaces.” After stripping leading and trailing spaces, it must be a valid URL string, but the spaces themselves are fine.[1]

In other words, a URL can’t start or end with spaces. But even though an <a href> becomes a URL by design, the href itself can have spaces.

An even deeper detail is that when an <a> is parsed into a Location object, the Location object’s href property won’t have spaces. This is easy to demonstrate in the browser...

> document.links[0].getAttribute("href")
⋖ ' https://www.example.com'
> document.links[0].href
⋖ 'https://www.example.com/'
F12 Console output showing the raw href attribute vs. parsed href property.

... but was difficult to find in the spec(s). Finally, I found that a Location object is said to have a relevant Document, and any Document has a URL. That URL is  derived using the Basic URL Parser, which explicitly has the 3rd step:

3. Remove any leading and trailing C0 control or space from input.

So one thing with the name href can have spaces, while another type of href cannot. Confusing!

Back to the Marketo problem

So why are links with leading spaces left untracked?

Because Marketo checks only the HTML href attribute to see if something is a tracking-worthy link. If the href doesn’t start with a sequence of letters followed with a colon — that includes not just http: and https: but also tel: and such — it’s thought to be some other kind of <a>, like a jump link within the email body, which shouldn’t be tracked.

In other words, it checks the HTML href only, not the URL. And as you learned above, those are not exactly the same.

Is this a bug in Marketo? I’d say yes. But is it worth fretting about rather than just keeping in mind? To me, that’s a no.

Notes

[1] I’m not sure exactly why surrounding spaces are allowed — even I haven’t been around that long! Maybe it’s in the W3C mailing list archives from 20 years ago, but I’ve got stuff to do.☺