Dealing with unencoded URLs in query params (an Agical special feature)

A PHP-related post! Weird way to get back to blogging after a couple months, yep. I don't roll much with PHP these days, but… disclosure… Agical.io* is powered by PHP. Hey, it's still a fine fit for a speedy, single-method, no-database microservice.

Didn't exactly fall back in love with PHP while writing Agical, but it turns out to have been a great choice, since one of its ancient compatibility features totally saved the day the other day.

Marketo Champ JN was building add-to-calendar links using Marketo's “automagic” {{member.webinar URL}} token. As you may know, this token is created by Marketo's built-in webinar integrations; after a slight delay, it's available for use in confirmation emails.

An example token value might be:

https://event.on24.com/eventRegistration/EventLobbyServlet?target=lobby20.jsp&eventid=1473181&sessionid=1&key=MOYE5IVHWWNB5AU3VMEGIFI7E9IXCOA2WGWF&eventuserid=175594408

An Agical link (add-to-Google Calendar version) referencing that token might look like this in Marketo:

https://ics.agical.io/?subject={{my.webinar-title-EN}}&dtstart={{my.webinar-date-ISO}}T{{my.webinar-time-ISO-start}}:00z&dtend={{my.webinar-date-ISO}}T{{my.webinar-time-ISO-end}}:00z&format=gcal&location={{member.webinar url}}

And when rendered in an email, it would look like this:

https://ics.agical.io/?subject=Let's%20Webinar!&dtstart=2017-08-01T15:30:00Z&dtend=2017-08-01T16:30:00Z&format=gcal&location=https://event.on24.com/eventRegistration/EventLobbyServlet?target=lobby20.jsp&eventid=1473181&sessionid=1&key=MOYE5IVHWWNB5AU3VMEGIFI7E9IXCOA2WGWF&eventuserid=175594408

See the problem?

After the token value is substituted in the params passed to Agical are

  • subject
  • dtstart
  • dtend
  • format
  • location
  • eventid
  • sessionid
  • key
  • eventuserid

Those last 4 are supposed to be part of the On24 URL, though. The whole {{member.webinar URL}} is supposed to be kept together so it can be passed to Google and added to your calendar. Oops! Instead, all Google will see for the event's Location is the string up until the first &:

https://event.on24.com/eventRegistration/EventLobbyServlet?target=lobby20.jsp

The rest of the Location is left on the cutting room floor, if you will.

The reason this happens is pretty clear. When you're passing a URL-type value as a query param and that value can have more than one of its own “inner” query params, you need to URL-encode the whole value or it (seamlessly and naturally) will be joined with the “outer” query params.

The problem can be slightly hard to track because it doesn't usually happen if the inner URL only includes one (or zero) query params. These two inner URLs will not cause parsing problems in most cases:

location=https://example.com
location=https://example.com?fruit=apple

But this slight variation will break on the & and give vegetable to the outer query string:

location=https://example.com?fruit=apple&vegetable=celery

The solution should be simple. When you want something to be unambiguously treated as a single query value, you URL-encode it:

location=https%3A%2F%2Fexample.com%3Ffruit%3Dapple%26vegetable%3Dcelery

The above syntax will never cause any confusion when reading or forwarding the value. By URL-encoding all of the special characters (not just encoding & to %26 but other characters like ? and = that can also cause problems) you ensure that the whole location string stays together.

But there's the rub: you can't pre-encode the {{member.webinar URL}} as Marketo/On24 doesn't support that. And since it's automatically generated, it's always going to be injected as-is (unlike, for example, a {{my.token}} storing a URL, which you can manually encode and paste in).

Toward a solution

I distantly recalled the PHP INI setting arg_separator.input (well, I had to look up the exact setting name, but knew it existed!).

arg_separator.input allows you to shift PHP to an old-school query string parsing mode, where the semicolon (;) character is the special param separator character and the & isn't special.

See, while it's long been forgotten, there was a time when you were supposed to use a semicolon. The HTTP specs were nearly begging people to avoid & (due to a different type of ambiguity I won't get into here) and to structure URLs like this:

http://example.com/path/?name1=value1;name2=value2;name3=value3

It never really caught on (too much lobbying, I guess, by Big Ampersand) and at this point most webservers assume you're using only &, treating ; as just any old character.

(So situation=normal;%20all%20fouled%up will be treated as the name situation and value normal;all fouled up — no unexpected split after the word “normal.”)

arg_separator.input lets you use both & and ; as separators (though that's not what we want here) or switch things up completely so ; is the separator and & is just any old character (which is really promising).

So if you switch to ; and have a URL like this:

http://example.com/?name1=value1;name2=http://this.example.com/is?all=part&of=the&same=param

Then everything's gravy. The params will be split only on ; (of course name=value pairs are still split on =) and even if name2 contains lots of & characters, it'll be treated as all one string.**

Almost there!

One more step

The only prob with the above in the real world is that once you set arg_separator.input it applies to all PHP processes using a given .INI file.

So you can globally switch to ; but then everybody who uses your service needs to remember that you do things this crazy old way. That's not good. And especially unacceptable when you have a service, like Agical, that's already in production and everybody who uses it (not that many people, admittedly, but users seem to like it!) assumes you do things like a sane person.

So here was my pretty cool way to roll out the hack so JN and others could use it without upsetting my more mainstream users.

I created a new directory off my doc root, /alt_sep (for “alternate separator”). In that dir I created a PHP User INI file — a per-directory INI config that only affects requests that come in via that directory. The file has just one line:

arg_separator.input = ";"

This will override the global PHP.INI setting (which still uses the default & separator).

Then I created a symlink that points /alt_sep/index.php to the parent (root) index.php.

ss

Now, the main Agical endpoint http://ics.agical.io works the same as ever. But there's a special alias for the endpoint as well. If you start with http://ics.agical.io/alt_sep then it assumes you're using semicolons to separate query params:

https://ics.agical.io/alt_sep?subject={{my.webinar-title-EN}};dtstart={{my.webinar-date-ISO}}T{{my.webinar-time-ISO-start}}:00z;dtend={{my.webinar-date-ISO}}T{{my.webinar-time-ISO-end}}:00z;format=gcal;location={{member.webinar url}}

And the cool thing is I don't have to maintain separate index.php files or PHP installations, just that directory with a tiny static file and symlink in it.

That's it! Now, back to my usual JS/Velocity/Marketo posts.

Notes

* If you don't know how Agical works, take a look here.

** And as long as {{member.webinar URL}} doesn't contain any stray semicolons. I mean look, this is a hack, you can't cover every case. :)