More to the point, all URLs, in all contexts, should be treated as case-sensitive. An easy rule to remember, right?
But you might’ve been misled by webservers that choose to do a case-insensitive URL match. Not all servers do so — and a server isn’t broken if it doesn’t do so.
In Marketo, a Design Studio URL like this works:
https://pages.example.com/rs/123-XOR-654/images/theme_NTP_background_fleur-de-lis-2.png
But if I accidentally lowercase my Munchkin ID, it fails (and redirects to my fallback page):
https://pages.example.com/rs/123-xor-654/images/theme_NTP_background_fleur-de-lis-2.png
It also fails if I swap Images
for images
:
https://pages.example.com/rs/123-XOR-654/Images/theme_NTP_background_fleur-de-lis-2.png
Interestingly, the part of the URL that’s the Design Studio filename is not matched case-sensitively. So this does work:
https://pages.example.com/rs/123-XOR-654/images/tHeMe_NTP_bAcKgRoUnD_fleur-de-LIS-2.PNG
This serves to prove the larger point, which is that you can’t expect a URL as a whole to be matched case-insensitively. A link is as case-sensitive as its most case-sensitive part, and as a marketer you can’t control the sensitivity.
(FYI, spec-wise, the filename is just another path segment. Even if it looks like a filename.extension on your hard drive, there’s no such thing as the “file” part of a URL path.)
All goes according to spec
The URI standard is clear that raw ASCII characters (those not part of a %-encoding sequence) in the path part, in the query part, and in the hash/fragment part of a URL are case-sensitive.
Note the scheme/protocol (https
) and hostname (pages.example.com
) parts are case-insensitive. It is, of course, impossible for a hostname to be treated case-sensitively on the web anyway (even leaving aside the normalization part) because DNS is itself case-insensitive. I looked at the ancient HOSTS RFC and even before/without DNS, a hostname is case-insensitive:
So there’s ample precedent for hostnames to be case-insensitive, and a clear standard for the rest of a URL to be case-sensitive. Thus if there’s anything after the hostname, even just a UTM param, the URL is case-sensitive.
So if it’s supposed to be case-sensitive, are the case-insensitive servers broken?
Nope. URLs themselves are different if they differ in case (save the protocol + host). That’s always true.
But when used to GET or POST a web resource, a webserver is allowed to return the same resource for multiple URL variations, after doing a case-insensitive lookup under the hood.
Ideally (though this is almost never done) all non-canonical spellings would be redirected to the canonical spelling to make it clear that the webserver has been forgiving.
Another thing to realize is that the webserver never sees the hash. So if a server returns the same document for the 3 URLs
https://www.example.com
https://www.example.com#myHash
https://www.example.com#MYHASH
that’s not actually a matter of choice — the server only saw https://www.example.com
in all 3 cases. Client-side JS can see the hash, though, and modify content accordingly (and you should expect it to use case-sensitive logic).
Also beware URLs-in-URLs
Don’t get overconfident with bare hostnames.
Yes, https://some.eXaMpLe.com
is implicitly case-insensitive (because it doesn’t have a /path
, ?query
, nor #hash
). But case-insensitivity only applies if the scheme and hostname are in their original positions.
That URL might end up in a query string. For example, when you pass referrers or next-hops around:
https://another.example.net?utm_source=https%3A%2F%2Fsome.eXaMpLe.com
Now, the original URL has been (as is proper) URL-encoded and placed in the query string so it can be read on another page. That means it’s no longer in the case-insensitive part of the URL. When your attribution code parses the query string, it’s going to do a case-sensitive match unless specifically told otherwise. So it will not see some.eXaMpLe.com
as matching some.example.com
.
Emails too, in theory... but not in practice
As I reviewed in an earlier post, email addresses are technically case-sensitive as well.
But unlike web URLs, which are very much case-sensitive in the real world, email addresses aren’t case-sensitive in practice; the negligible number of SMTP servers that still treat them as such isn’t worth worrying about.
(And the host part of <mailbox@host>
could never be case-sensitive, for the same DNS/HOSTS reason noted above.)