Case-sensitivity quirks with Velocity sorts

To newbie developers (and to non-developers) there are 2 ways to sort text: “alphabetically ascending” or “alphabetically descending.” That’s it.

By “alphabetically,” those folks mean case-insensitive sort order, where uppercase/lowercase pairs (in the local language) are treated as the same letter.

But if they simply say “sort alphabetically” to a more technical person, without any additional flavor, they’re likely to get results in another order – by far the more computationally efficient order[1]case-sensitive, purely lexicographic sort order.

This is a great example of the simple-but-huge communication gaps between techies and non-techies (and, I must say, a good reason for project managers to exist to clarify stuff).

Both sides are acting in good faith. The inexperienced/non-techie person is using what, to them, is the only meaning (not just the informal meaning) of “alphabetically.” The techie isn’t being pretentious, as programming tools and practices favor their interpretation. But such a little thing can cause “broken” code (in one party’s eyes) and much finger-pointing.

What side is Velocity on?

To see which definition Velocity uses, let’s make a short list of names, then sort it (ascending order is the default with SortTool), then output it:

#set( $names = ["alex", "CHARLIE", "bob", "Dora"] )
#set( $sortedNames = $sorter.sort( $names ) )
Names: $display.list( $sortedNames, "," )

The output:

Names: CHARLIE,Dora,alpha,bob

OK.

The fact that C and D come before a and b is not a bug. That’s the hallmark of case-sensitive comparison, which only cares about the position in the Unicode table. The uppercase letters A-Z, codepoints 65-90, all come before letters a-z, codepoints 97-122.

This seems to prove that $sorter.sort uses the higher-speed, case-sensitive lexicographic order.[2]

Not so fast.

Let’s try sorting a list of objects on a String property of each object. That property, name, will have exactly the same values as above.

#set( $objects = [
  { "name" : "alex" },
  { "name" : "CHARLIE" },
  { "name" : "bob" }, 
  { "name" : "Dora" }
])
#set( $objectsSortedByName = $sorter.sort( $objects, "name" ) )
Objects: $display.list( $objectsSortedByName, "," )

The output:

Objects: {name=alex},{name=bob},{name=CHARLIE},{name=Dora}

Whoa, whoa... that is in case-insensitive order by name, and we didn’t do anything special!

What’s happening?

It’s like this.

  • When Velocity sorts a List of Strings, it uses the native String.compareTo function, which is case-sensitive.
  • When Velocity sorts a List of other Objects (i.e. Maps) on a common property of each it checks first to see if the property has the String type[3] and if so, uses String.compareToIgnoreCase, whose name is self-explanatory.

So the answer to the question Does Velocity sort case-sensitively or case-insensitively? is: Yes. (Sorry for the YouTube-comment-style humor.)

There’s no built-in way to change the first behavior into the second, nor vice versa. They’re just different code paths based on the type of the List. The arbitrary (and undocumented) difference is debatably a bug (as you’ll see more of if you work on the challenge below) but that’s the way it is.

What we’d like to do, but can’t

If we had a full-fledged Java environment we could write a custom Comparator or an overridden compareTo function to ensure that we knew exactly what type of sorting would happen when. In some underinformed Velocity guides around the web, there’s an underlying assumption that you can “just” have your Java developer export new helper functions and extended classes into the Velocity context and “just” use those.

But in Marketo’s Velocity environment, as well as any environment where template coders are confined to just what’s in Velocity now, you have to use what you’re given. (Even more so since Marketo’s June 2019 changes.)

So our workarounds all have to be written in VTL.

Making a List of Strings sort like a List of Objects

If you want to sort Strings like Objects, you have to

  • wrap them in Objects
  • sort the Objects by the String property
  • use the String property of each Object going forward

Easy to write, though unavoidably clunky:

#set( $names = ["alex", "CHARLIE", "bob", "Dora"] )
#set( $namesWrappedInObjects = [] )
#foreach( $word in $names )
#set( $void = $namesWrappedInObjects.add({ "name" : $word }) )
#end
#set( $objectsSortedByName = $sorter.sort( $namesWrappedInObjects, "name" ) )
$display.list( $objectsSortedByName, ",", ",", "name" )

The output:

Names: alex,bob,CHARLIE,Dora

To review:

  • create a new empty List (ArrayList, [] in Velocity) $namesWrappedInObjects
  • for each String in the original List, create a new Map (LinkedHashMap, {}) and set its property name to the original String’s value
  • add each Map to the new List
  • sort the new List on name

I also printed the new List using the advanced form of $display.list that grabs just a single property. This was just for consistency with the original example, your business reqs may not involve direct output.

It works, but at the expense of allocating a new List + new Maps and more lines of code.

Making a List of Objects sort like a List of Strings

The reverse is more complex. Imagine you wanted to sort a List of Objects by a property, but in the more machine-y, less human-y lexicographic order. (Granted, in a marketing email it’s hard to imagine this requirement, but Velocity can be used for more than emails.)

Luckily, we know a List of just Strings will be lexicographically sorted. So we’re going to:

  • create a new empty List
  • for each Object in the original List, get the value of the interesting String property (name in in today’s examples)
  • add the String value to the new List
  • sort the new List (a List of Strings, hence lex’ly sorted)
  • go back over the original List
  • for each Object in the original List, set a new Integer property that corresponds to the String property’s position in the separately lex’ly-sorted List; here that property name will be lexOrder
  • sort the original List on the new lexOrder property (as this is a numeric property, it’s an ascending numeric sort)

Clunkier even than the above, but it works:

#set( $objects = [
  { "name" : "alex" },
  { "name" : "CHARLIE" },
  { "name" : "bob" }, 
  { "name" : "Dora" }
])
#set( $comparableValuesOnly = [] )
#foreach( $object in $objects )
#set( $void = $comparableValuesOnly.add($object.name) )
#end
#set( $lexoSortedComparables = $sorter.sort($comparableValuesOnly) )
#foreach( $object in $objects )
#set( $void = $object.put("lexOrder", $lexoSortedComparables.indexOf($object.name) ) )
#end
#set( $objectsSortedLex = $sorter.sort( $objects, "lexOrder" ) )
Objects: $display.list( $objectsSortedLex, "," )

The output:

Objects: {name=CHARLIE, lexOrder=0},{name=Dora, lexOrder=1},{name=alex, lexOrder=2},{name=bob, lexOrder=3}

When working with the now-sorted list, you can ignore the lexOrder property, as it no longer has any significance. (If the name lexOrder were already in use, you could use any other property name, it just has to be one you know.)

Conclusion

This isn’t the only place in which Velocity is wildly, even impressively inconsistent! You can’t simply guess how it works (past a certain point): you must take time to learn. It’s not at all simple to write correctly functioning, resilient VTL and those who say otherwise, upcoming smiley notwithstanding, shouldn’t be in your instance.

Parting challenge

Above, I showed how to sort a list of objects by a String property in ascending lexicographic order, using the separately computed lexOrder property.

Now: How might you sort a list of objects by that same property in descending lexicographic order? Let me know in the comments.


Notes

[1] It’s dramatically easier on CPU resources to do a case-sensitive sort: for large blocks of text, you save millions of operations. It just makes sense as the default. More on this in another post.

[2] More precise than “case-sensitive” might be “case-ignorant,” as there’s no applied knowledge of case pairs. But that gets confused with “ignoring” in the sense of “treating uppercase/lowercase pairs as the same character.” Ugh, language.

[3] Assuming all the names are Strings. There are other ramifications, most of them them fatal, if you mix types. Again, more later.