Tokens as hacker tools — and how you can reduce the risk

When you think like a hacker, as I learned to in a long-ago role as security admin, martech can seem almost quaintly vulnerable.

Sometimes, DoS/DDoS and data theft openings are knowingly created — and, to be frank, covered up — in the name of demand gen and quick delivery. (I'll cover some examples in other posts.)

But in the common case, users truly don't know there's anything wrong: you're not in IT, nobody told you there was a risk, and the app didn't pop up any warnings. Nevertheless, significant vulnerabilities exist. These vulns are old hat to the security community, but marketing doesn't get those memos.

As Marketo user PS was informed by an anonymous tip, a common campaign setup might be irresistible to hackers who want to use your systems to spread spam or malware, embarrass your company, or both.

Such is the case when you do one very simple task in Marketo: include a {{lead.token}} in an outbound email without sanitizing the value. (Same applies to tokens/variables in any MA platform, by the way: this is not a Marketo-specific problem, though I will give a Marketo-only solution.)

If you aren't sure if you've done this, I can almost guarantee you have. Starting an email with “Hello, {{lead.First Name}}” is all it takes.

How could that feature possibly be useful to a hacker? Let's take a step back (well, back for the first time for most of you guys!) into...

Web Security 101

A clichéd but useful guideline for building secure (as well as more maintainable) systems is Data is data, code is code. You may also have heard the refrains Filter input, escape output and Never output untrusted data.

For our purposes today:

  • untrusted data means a value, like one supplied by an end user on a form, that has not gone through full validation (making sure it conforms to a data type and/or pattern) and sanitization (removing inappropriate, error-prone or insecure parts).
  • escaping means altering a value, not only so that is readable in a particular output context (like inside an <A href> attribute, in the text part of an email, or inside a database query) but also so that anything that might turn the value into scriptable, runnable, or otherwise active content is disabled or removed. A simple value coming in must remain pure, dumb data, not code, going out.[1]

In sum, you must make sure that a user-supplied value is never anything but a simple value, even if you are only echoing that value back to the original user — or ***spoiler alert*** the person the user claimed to be.

Let's get back to PS's Community post. Like all of us, he has a form with a textbox for First Name. That's a system field in Marketo, of course, of type String. When you add it to a form, you typically don't even apply a pattern mask to it, since people can have all kinds of characters in their given names, including hyphens, periods, spaces, and multiple words.

Let me be clear: there's nothing wrong with just letting the lead have at it. In fact, even if you applied some kind of restrictive pattern like [a-ZA-Z0-9.- ] to the box on the form, it wouldn't help with this problem, and you'd probably just end up ticking someone off, like someone who wants to type Josephina ("Jo") in the box.

(The reason it will not help with this problem is that all browser-side validation/pattern matching can be trivially bypassed. And assume our hypothetical hacker has at least trivial skillz.)

Forward from a foe

So you let them put any First Name they want, and you send an email that has {{Lead.First Name}} in the body (in PS's case he's referring to an auto-responder triggered by the form post, but it could just as well be part of a batch sent later).

But you're sending to the Email Address entered on the form. The person who owns that mailbox need not have any relationship — certainly not a good one — with the person who filled out the form. Assume they are enemies.

So what, you're thinking, a person who wasn't expecting it gets a welcome email. What's the big deal? They would probably unsubscribe, which is bad since you can't have them implicitly opted in if they discover you via another method (viz. durable unsubscribe). They might file a spam report, which can be bad for your reputation, but since your email is otherwise clean and professional, you can probably avoid wider consequences.

Is the email guaranteed to be clean, though? What if they entered this:

ss

Does that mean there's going to be runnable JavaScript in the email? Nope, if the field is set to HTML encode tokens, you're just fine. This is what the user sees:

ss

And this is what the underlying HTML looks like:

<p>&lt;script&gt;stealYourCredentials;&lt;/script&gt;</p>

Whew, safe and sound. The dangerous < and > characters are escaped to harmless text, so it does display as originally entered but cannot run.

But what if they entered this instead:

ss

Now, you've got a problem. Imagine they did the same thing for Last Name and Job Title. Now, your introductory email is a fully operational hacker tool:

ss

And those are live links that will take the lead to a site that will steal their identity or worse. It could be any domain. The point is, you just sent spam, even signing the spam with your DKIM key to show your approval of the content. The best-case scenario is that a wary recipient reports you as a spammer, instead of clicking a link and suing you later!

Or imagine tricking you into advertising a competitor's site (unlikely to be done by that competitor due to the whole illegality thing, but easily done by a hacker “for the lulz”). Hackers may be more pranksters — or, in the optimistic view, educators — than they are criminals, but in the course of teaching you a lesson they can cause you a lot of pain.

Let's go over what's happened here. You sent untrusted values to someone you thought was the form-filler-outer without [a] filtering the values on the way in for possibly malicious content, nor [b] altering (escaping) the values on the way out so they cannot become live links.

Those live links represent the “active content” that you must avoid creating. Yes, the links will not click themselves (which is, I guess, good) because emails can't run script, but they are clickable (which is very bad). From a security standpoint, malicious links are as bad as it can get in an email, short of sending a virus as an attachment.

What's the desired outcome?

So what do you want to happen if someone includes a URL in a text field? Obviously it would be great if you could discard any form post that includes URLs where they aren't expected. This would conform to the “filter input” rule.

There's something else, though. In overly generous (grr...) email clients like Gmail, even a “domain-like string” like www.example.com becomes clickable — without http:// or https://! They really, really want to make it easy for you to send links. That's a problem for sanitization, because it means you can't just scan for https?:// and remove it. You have to scan for any.dotted.string.sequence, which means you could easily have false positives.

But anyway, the problem isn't so much the false positives, it's that Marketo has no built-in way to filter data with the regular expression capability you'd need to get this done, let alone a way of permanently saving sanitized field data.

And what about URL fields?

If you have a field for Website, you definitely can't ban them from supplying a URL! And filling in Company with a URL-like value doesn't make them malicious if they work for a website.

So this isn't as simple as banning all URLs. Now granted, you can make the rule that URLs are only allowed in a couple of fields, and that those fields will never be echoed back to a lead in an email. But enforcing that rule would be next to impossible.

Either way, there isn't a built-in way to filter input at the level you'd need.

Are you saying this is another job for FlowBoost?

It could be, but that's not where I'm going today. While FB can search and replace anything you want, permanently changing (sanitizing) input may not be the best direction.

For example, as a brute-force solution (and admittedly, it is a solution) you could use FB to replace all periods with underscores. That would mean malicious.example.com would become malicious_example_com and effectively neutered. It would also mean that legitimate.example.com becomes legitimate_example_com, though. And that isn't so good. You want those legit URLs to be useful in other contexts, like in SFDC, for research, etc.

Escaping output using Velocity

The least intrusive way way to deal with touchy, untrusted data is to escape output. This means you aren't changing the stored data, which you want to use as-is in other situations. You're outputting the data differently on a case-by-case basis and may use varying methods over time.

(The clearest reason for temporarily escaping, instead of permanently changing data, is that the same raw data requires different output in different contexts. A classic example is HTML context vs. URL context. The space character appears as %20 or + in a URL, while in HTML text it may need to be &nbsp; or plain . You can't permanently change it to one of those, since there's more than one “right” way.)

Here's the Velocity-based solution to the panic that might've just set in:

#set( $unicodewj = "\u2060" )
${lead.FirstName.replaceAll("[./]","${unicodewj}$0")}

This snippet inserts one of my favorite characters, a zero-width word joiner, before all periods and forward slashes. This doesn't change the output displayed in your mail client (WJs are never shown) but at the raw data level, it breaks up URL-like strings so they won't be made clickable. Try it!

Can it get worse?

Yes. If your tokens aren't HTML encoded, you could end up with malicious JavaScript that, while it wouldn't run in any modern email app, would run automatically in View as Web Page view. Luckily, most spam filters will see an attempt to embed a <SCRIPT> tag as indicative of spam, so the original message wouldn't reach the inbox.

So let's try to stay upbeat.

Notes

[1] Exceptions are online IDEs like CodePen, which are designed for developers to enter JS into textboxes — such input obviously has to be run on the way out, so it can't be escaped.