Munchkin is broken (by default) for 2-letter TLDs

A question from top Marketo user GM in this post led to an interesting, even disturbing, hour of troubleshooting. Finally, I've gotten around to writing it up.

GM was skeptical:

I was told a while ago by support that when using the munchkin tracking on a domain which top level domain had only 2 letters, we had to set the domainlevel Munchkin.init parameter to 2. Reading the doc, it seems that this parameter has a totally different effect...

This was a frustratingly wrong answer from Support, but not entirely surprising because I doubt anybody inside or outside Marketo has realized how bad the situation can be if your domain sits directly under a 2-letter TLD like .io or .ly.

These are ccTLDs (country code TLDs), but their operators freely allow second-level domains (2LDs) like github.io or bit.ly to be registered by foreign entities looking for a “kewl” (do we still say that?) domain who have cash in hand.

But when Munchkin sees a 2-letter TLD, it makes a bad guess about the proper domainLevel — the domain at which it sets the vital web activity tracking cookie. Using TLDs like .au as a model, Munchkin incorrectly assumes that you cannot have registered a domain directly under the TLD and automatically sets the domainLevel to 3. This has the same effect as if you hard-coded

Munchkin.init( 'AAA-111-CCC', { domainLevel: 3 });

Such guesswork happens to be fine for .au, since the Australia NIC only lets commercial entities register 3LDs like example.com.au, not 2LDs like example.au. But it's totally wrong for .io, since the Indian Ocean NIC lets you register right under the TLD, like my own agical.io and the name-brand examples above.

In fact, the guess is wrong for so many TLDs that it's crazy to make a guess at all. I can register whiteman.fr or whiteman.mx, but would have to go one level deeper for .uk or .nz and get whiteman.co.uk and whiteman.co.nz. The fact is there is no way to know, based only on the length of the TLD, whether the current registered domain is the 2LD or the 3LD.

Maybe it hasn't hit you yet and you're wondering what the big deal is. Well, the consequences of this bad guess are huge when you try (as we all do) to track visitors across our main website, like www.exemplifier.fr, our Marketo-hosted LP subdomain pages.exemplifier.fr, and any other related properties that end with .exemplifier.fr.

Note this isn't what is typically referred to as “cross-domain tracking” (that's always a major technical burden to set up). This is standard same-domain tracking, but for many 2-letter TLDs, Munchkin can't do it right.

Here come the consequences

What happens instead? Well, in the exemplifier.fr examples above:

  • When someone visits pages.exemplifier.fr, Munchkin sets a cookie with the domain .pages.exemplifier.fr rather than at .exemplifier.fr
  • When that same person visits www.exemplifier.fr (maybe they link to the main website right from the LP) Munchkin looks to see if there's an existing cookie. It doesn't see the one from .pages.exemplifier.fr because, by the definition of cookie security, you can't read those cookies when you're on www.exemplifier.fr. So it sets a new, randomly generated, anonymous cookie at the the 3rd-level domain .www.exemplifier.fr.
  • If the person visits the LP domain again, the browser only sees the cookie that was originally set there, not the one from www.

So?

So even if the Munchkin cookie is associated with a known lead on domain A, web activities on domain B will still be anonymous.

If a lead clicks a link in an email that goes to your Marketo LP domain, their Munchkin session on that domain — but that domain only — is automatically associated (thanks to the mkt_tok query param) with their record in the lead db. You will see Clicked Link and Visited Web Page activities on the LP domain, but you won't see such activities on your corporate website, since they're still anonymous there.

And vice versa. If a lead clicks a link in an email that goes to your main website, clicks and pageviews there will be associated with their lead record. But not their Munchkin session on any other sites that end with .exemplifier.fr.

This is kind of a silent killer for tracking. I imagine that > 50% of Marketo users with 2-letter TLDs are affected, and they probably have no idea why. (If they notice the problem at all, they may think Munchkin never works across a whole registered domain, when in fact it totally does if configured correctly — or if the TLD has more than 2 letters!)

Another related quirk

Just today another Community user highlighted an overlapping Munchkin cookie problem, though they weren't aware of the connection.

This user had loaded Munchkin on a development website (I'm assuming based on the context that it wasn't in prod) and they were accessing the site by its IP address (hey, it happens!): http://10.20.30.40.

So let's see what Munchkin does here:

  • It parses the current TLD as .40 (misnomer to call the last octet of an IP address a “TLD” but let's hold our breath on that).
  • Sees a 2-letter TLD.
  • Sets the automatic domainLevel value to 3, like it always does in these cases.
  • Tries to set a cookie with the 3rd level domain: .20.30.40.
  • No cookie is set. Browsers will never let you set a cookie that would apply to any IP address that ends with those 3 numbers.
  • Web activities will not be tracked.

What if you changed the IP address of the server to 10.20.30.100? Not much better:

  • Sees the current TLD as .100.
  • That's a 3-letter domain, so it sets the automatic domainLevel to 2, like it would for example.com.
  • Tries to set a cookie with domain .30.100.
  • No cookie is set for the same reason as above: you can't do partial domain matches on IPs.
  • Web activities will not be tracked.

With an IP-as-hostname, you want the cookie domain to be the 4th level domain: .10.20.30.40. So neither domainLevel=2 nor domainLevel=3 is correct.

How do you determine which domainLevel is correct?

You either know which domain you bought based on your receipt from Dotster or suchlike, or you have to set test cookies.

As noted above, there is no way to distinguish based on the number of letters in a domain or any other non-cookie-related mechanism which domain level represents the private (registered, purchased) domain at which you should set a tracking cookie vs. the disallowed public domain(s) above it vs. the other child subdomains below it.

This may be surprising to some, but neither the old-school ccTLDs nor the vast number of new gTLDs like .space and .lawyer follow any standard in this regard. The operators of .space could require that people only buy domains under .office.space. Therefore, an end user could not set a cookie at .office.space itself, only at .my.office.space or another 3LD.

Browsers do have an internal list of where they'll let you set cookies (though older, unmaintained browsers don't know about new gTLDs, which is a whole other wrinkle). But you can't query that list directly. You have to try to set cookies at every possible level, “walking the domain tree,” then check for the highest level at which a cookie exists (levels disallowed by the browser will not generate an error, but they will not set the cookie).

The script you're waiting for

You didn't think I'd leave you to write that domain-walking script yourself, I hope! Nope, I wrote it several months ago for a project, and it's proven invaluable.

Download findPrivateSuffix.js from here.

Note the script is totally free-to-use, but it is copyrighted (MIT License). I request that you do not use this script without attribution (include the ©). It fills a very particular niche that came out of rather deep research, and I don't know of any other code like it. It would be very frustrating for people to copy-and-paste it as their own. So please be nice. :)

Running findPrivateSuffix()

The script offers one function, findPrivateSuffix(options).

Available options (currently, there's only one):

cachetrue or false, defaults to false. Whether to cache the tree-walking results for the duration of a browser session. This option is off by default so you can test easily, but it's strongly recommended that you turn it on in production. It's impossible for the registered domain to change during a browser session, so it is always safe to cache, and far more efficient.

The function returns an object with three properties:

domainLevel — the numeric domain level to pass to Munchkin.init()

cookieDomain — the string cookie domain, so you can use it in other code

source — source of the returned data: 1 is cached result, 2 is for direct testing

Example usage and output:

ss