In Marketo String/Text fields, surrogate pairs MUST be URL-encoded, and you SHOULD encode line breaks

We like to think these are synonyms:

  • “the value you tell Marketo to save”
  • “the value Marketo saves to the database”
  • “the output of a {{lead.token}}

But they’re just... not.

In fact, Marketo breaks changes values in totally undocumented ways when writing to the database and populating tokens.

Sometimes it truncates the saved value when it encounters a character it doesn’t like. Sometimes it saves as-is but the corresponding {{lead.token}} replaces original characters with new ones. And the behavior varies based on whether the source was a form, the REST API, or the Marketo UI.

The known offenders are:

  • line breaks: that is, real U+000A, not HTML <br> tags
  • surrogate pairs: any Unicode character requiring 2 paired bytes, such as common emojis like 👍 and 😛 and fancy arrows like 🡲 and 🡘 (plus hundreds of thousands of others, though most are archaic languages you’re unlikely to see in marketing data)

Line breaks are — yes, it’s true — turned into spaces in {{lead.tokens}}

This revelation has been lurking in the corners of Marketo Nation for awhile. Someone will mention it but add the disclaimer “it must just be my instance?” Or they’ll try to use Velocity replaceAll() to turn line breaks into <br>s (the right move) but conclude “it didn’t work, my code must be wrong.”

Well, it didn’t work because the line breaks aren’t there. Marketo replaces them with standard spaces (U+0020). That’s why I provided the code to show every codepoint the other day, to prove it to myself and let you do the same in your instance.

If you have a field whose actual value is:

This once
had some
line breaks.

That’ll display as expected in the Lead UI:

It’ll also maintain the line breaks in your CRM and when you fetch via the REST API, because the actual database value has line breaks.

But both as a {{lead.token}} and in Velocity, it’s gonna be:

This once had some line breaks.

That’s it, you can’t reverse it to the actual database value. And that’s the, uh, breaks. Because it means you can’t ever format it in an email as originally intended.

Surrogate Pairs in an API or UI update: Value is truncated immediately before the first SP

This behavior is rarely, if ever, noted! Say you’re using the REST API (not a Marketo form) to update leads and a lead typed this in a Comments box:

Longtime product user.👍
Hoping to get pricing for an enterprise contract.

That value will be permanently truncated before the emoji. You’ll only get this:

And the API call won’t throw an error, either:

{
 "action" : "updateOnly",
 "lookupField" : "id",
 "input" : [{
   "id" :  16526262,
   "comments__c" :  "Longtime product user.\ud83d\udc4d\nHoping to get pricing for an enterprise contract."
 }]
}
{
  "requestId": "15f52#18b35d2945b",
  "result": [
    {
      "id": 16526262,
      "status": "updated"
    }
  ],
  "success": true
}

The only evidence that something went weird is the Change Data Value, which has “Missing history details”.[1] Presumably that’s because the requested value and actual value differ, so logging of the new/old value goes haywire:

If you make the same change in the Lead UI, it appears to work (no “Missing history details” but a standard Change Data Value):

But when you refresh, you’ll see it’s truncated just like it is via API. Pretty bad, eh?

Surrogate Pairs in a form fill: SPs are removed

With a true Marketo form fill, the platform is much more gentle. If you put that same value in a Textarea:

Then the Filled Out Form activity shows the value with the SP removed (but not otherwise modified/truncated):

And that’s accurate, as the stored value preserves everything but the SP:

So not too destructive. But nevertheless if someone leaves a 👍 or ☹️ or 😊 in a field, I expect to see it in Marketo or I’d consider it a bug.

Solution: URL-encode the known offenders only

The solution is to URL-encode any SPs, line breaks, and (this will make sense when you think it through) the literal % character, leaving all other characters alone:

Longtime product user.%F0%9F%91%8D%0AHoping to get pricing for an enterprise contract.

My next post will show how to do that selective URL-encoding in JavaScript — it’s simple and easily ported to other languages — and how to decode in Velocity.

Notes

[1] This error is only mentioned in one official Marketo doc and the explanation there is not correct, or at least not anymore.