Code Anatomy: Get a real array from an array-like string (a.k.a. “list”)

JavaScript arrays are magic. Really, they are. Array#mapArray#findArray#some… don't know how I could get anything done without them.

Problem is, in the CRM and MA world, array-like data — that is, a set of independent values — rarely comes to you as a real array. Instead, you get fields that humans might think of as a “list,” but which really is just a single big string. Take this Marketo field at one of my clients:

ss

You look at that as a developer and groan. Even though you know what it's supposed to represent (at least right now) and end users may not see the inherent problems, you see the potential trouble to come. Note the uneven spacing around the commas; wonder what happens if a single value needs to contain a comma; dread searching accurately for "Monthly-Diamond" when another subscription might be "Bi-Monthly-Diamond" and values can occur in any order.

Ideally, even if you're confined to a string data type in SFDC/Marketo, you'd store such data as perfectly-structured JSON strings, like

["Blog-Updates", "Monthly-Special-01", "Monthly-Diamond", "Daily-Petro"]`

No confusion there, and decoding is easy with built-in JSON.parse(). But real life doesn't work that way. JSON is easy-ish for techies to read, but it's not easy enough to trust sales reps to update in-place (definitely not when the UI is a single-line textbox in Arial... grrr.)

And even if you could get users to maintain a few back-end JSON fields without breaking them, when a multi-select field comes in from a form post, it's still gonna look like

ss

This semicolon-delimited format is what you're stuck with unless you do a lot of JS jerry-rigging on all your forms, write Apex triggers, and generally create a big mess.

So the fact is: you're going to have these array-like strings, but not actual arrays, all around your ecosystem. But since you'll want to be able to loop over individual values from JavaScript (either parsing a token on a Marketo LP or in a FlowBoost script) you need to convert the strings to arrays.

At first glance, it's as easy as

function simpleListToArray(list){
     return list.split(/\s*,\s*/);
});

This is spliting the string on each match of the regular expression “any amount of whitespace on either side of a comma.”

That does indeed work... unless you want to have a comma within a list item. You could of course make the rule that “List items may not contain commas, as commas are used as a delimiter.” But that seems a little too hardcore to me. Wouldn't it make more sense that a list item may contain a comma provided it's properly escaped? I think so.

To me the most straighforward way to escape a reserved character is with a backslash: let's say \, is a literal comma that should be preserved, rather than a delimiter, so the string field

gee\, oh\, gosh, good golly\, miss molly, maybelline

is intended to contain 3 items, not 6. The 3 items are

  1. gee, oh, gosh
  2. good golly, miss molly
  3. maybelline

Note to set that string programmatically (i.e. if it's not being entered via in the Marketo UI) you'd need to double-escape. But since we're concentrating on data already in the database, assume the escaping is already being done.

The code to parse that string into an array is obviously beyond a one-liner, but let's see if we can make it shortish, clean, and intuitive.

Let's start out with the original primitive code, but split strictly on a comma instead of on comma-with-surrounding-whitespace. (Spoiler alert: I already figured out there will be problems if we strip all whitespace at the beginning.)

 function simpleListToArray(list,delimiter) {
   var delimiter = delimiter || ',';
   
   return list
     .split(delimiter);
 }

OK, that's gonna give us an array like

[ "gee\", " oh\", " gosh", " good golly\", " miss molly", "maybelline" ]

So we have an array, but not the final array we want, since the first 3 items should actually be combined into one, and the next 2 items should also be combined. But at least we've preserved the fact that there were originally escaped commas, because we still see the trailing \. That will allow us to re-concatenate the items that need it.

How to re-concatenate? Well, here's where I get a little too cool for my britches (and may have missed something, as I hope you'll tell me in the comments).

The awesome function Array#reduce is used to reduce an array of values to a single value (an easy example would be reducing an array of numbers to a single number that's the sum or average of all items).

But there's no reason that it can't be used to reduce one array to another array. An interesting difference between reduce and other iteration methods like forEach and map is that reduce automatically gives you a view of both the previous and current values in the array, while the other methods only directly access the current value.

So each time you look at index (x) in the array, you also have access to index (x-1). That's what we need here, since we need to “lookbehind” item " oh" and make sure it gets re-combined with item "my" to make my, oh again, and then keep doing the same for any other item where the previous item ended with \.

(You could build an equivalent function by using a closure or recursion to access the previous value, or seek back into the original array based on the current index. But reduce is ready and waiting, so let's use it.)

 function simpleListToArray(list,delimiter) {
   var delimiter = delimiter || ',';
   
   return list
     .split(delimiter)  
     .reduce(function findEscCommas(prev, next) {
        if (prev.length && prev[prev.length - 1].match(/\\$/)) {
          return prev.concat(prev.pop().replace(/\\$/, delimiter + next));
        } else {
          return prev.concat(next);
        }
     }, []);
  }

What's happening here is

  • We start with an empty array [] as our reduce value (the second argument passed is the kickoff value)
  • We loop over each item in our intermediate array while peeking back at the previous item
  • If the previous item didn't end with \, then we push the current item onto the end of the result array
  • If the previous item did end with \, then we remove the previous item (temporarily) from the result array and re-add it, concatenated with the chosen delimiter (i.e. comma or semicolon) and the current item

It works quite well in my tests.

There's still one step that's missing. The code splits on the delimiter (just , or ;) rather than the delimiter plus any surrounding spaces. This will give unexpected results because in these list-like strings, leading and trailing spaces around each value are expected to be insignificant, since they are probably the result of user laziness. So the sloppy-looking

  ice cream; kale    ; salmon skin

should give the same output as the more precisely typed

ice cream;kale;salmon skin

We clean this up by passing the array values through the JavaScript trim function before returning. Here's the final code:

 function simpleListToArray(list,delimiter) {
   var delimiter = delimiter || ',';
   
   return list
     .split(delimiter)
     .reduce(function findEscCommas(prev, next) {
      if (prev.length && prev[prev.length - 1].match(/\\$/)) {
        return prev.concat(prev.pop().replace(/\\$/, delimiter + next));
      } else {
        return prev.concat(next);
      }
      }, [])
     .map(function trim(itm){ return itm.trim() });
 }

I used this function in the code I posted to this thread, which shows how a FlowBoost script can be used to extract parts of an email address and (maybe) get some meaningful company info from the topmost part of the private domain, i.e. “microsoft” in microsoft.com or “google” in google.co.nz. (I don't really think the OP is going to get much out of this experiment, but it was a nice showcase for FlowBoost and led me to shape up this cool little simpleListToArray for publication.)