Marketo field limits in the real world, Part I: Integers

Update 2018-09-10: The new post on Currency and Float fields can be considered part of this series.

If there's one official Marketo doc that needs some edits, it's the newish
Marketo Field Limits by Field Type rundown.

Unfortunately, that article has incorrect info and some sins-of-omission.

Be warned: most of this post is (necessarily) dry and technical. But I always find competing claims to be interesting, almost regardless of the topic. Maybe you will too!

Plus, even experienced high-level programmers (meaning languages other than C and C++) don't have this stuff at their fingertips, and may make implementation mistakes as a result. So if you're in a meeting and want to show some mild “data scientist” chops, these are good facts to drop.

I'll be adding to this series over time. To start off…

Integers

From the doc:

Integer

Values are limited to between -9223372036854775808 and 9223372036854775807.

This is technically correct.

Marketo stores Integer fields as signed 64-bit integers — signed meaning they can be negative or positive — so they fall into the range of -(2^63) to +(2^63-1). It can be useful to look at the high number in an easier-to-read way:

9 223 372 036 854 775 807

So 9 quintillion, 223 quadrillion, 372 trillion something.

If such a number seems unimaginably high in a martech scenario, I fully agree (though don't forget the ol' Grain of Rice parable[1].

But the truth is you can only safely go up to the max value if you don't integrate any other apps with Marketo. No Salesforce[2], no JavaScript, no 32-bit PHP apps or older databases. When you integrate, the actual limit can be a tiny fraction of that.

SFDC Integers aren't Marketo Integers

The most glaring problem is that if you sync a Marketo Integer with an SFDC Integer (which intuition might tell you to do) you will be limited by the SFDC side. SFDC uses 32-bit integers, whose range is –2,147,483,648 to 2,147,483,647.

Unlike 9 quintillion-something, 2 billion is a number we see in run-of-the-mill martech apps.

For one thing, you could see it as Annual Revenue (you'll more frequently see string values like "$3B" which are outside of the integer domain, but you can't do any math-related stuff with those). Surprising as it may seem, you can't represent a 10-billion-dollar company with an SFDC Integer.

But even more common is seeing large integers as synthetic primary key values (a.k.a. autonumbers, autoincrements, identities, etc.) that are critical to synchronization between other systems and Marketo.

For example, a Lead ID might be 5123456, even in a database with just a hundred thousand leads. This may seem strange, since you'd think the numbers would start at 1 and increment by 1. But bear these factors in mind:

  • There's a tradition of starting from numbers like 100000 (instead of 1) because such IDs are thought to be more professional.[3] (Think of college courses starting from POL 100.)

  • IDs aren't reused. So importing and deleting 100,000 leads 10 times consumes 1,000,000 IDs.

  • In some scenarios, choosing a number at random from the full range, as opposed to predictably incrementing one by one, is necessary for security. (There's a whole field of security research that exploits predictable sequences.)

  • Some apps will share a sequence generator among multiple services and/or customers. So customer 1 gets Contact #106125, customer 2 gets Contacts #106126 and #106127, and customer 3 gets Contact #106128. This guaranteed uniqueness can be really convenient if customers ever need to merge data. But of course it means each one doesn't get the full range to themselves.

So using 64-bit integers as identifiers, and possibly spreading across their whole range from the start, is very smart. If you use 32-bit integers and create a couple thousand rows per second, you'll run out of available values in a month! (This would take thousands of years with 64-bit.)

So you can't safely synchronize a 32-bit field with a 64-bit field unless you are 100% sure that the 64-bit side will never use more than the first 2 billion possible numbers. Most apps can't make that guarantee, since they've chosen 64-bit because of its massive scaleability.

So if you can't safely sync an SFDC Integer with a Marketo Integer, what else can you try?

An SFDC Double is better (though still flawed)

The Salesforce Double type is as close as you're going to get to a safe sync with a Marketo Integer. But it's still not totally safe.

The safe[4] integer range for a Double value is -9007199254740991 to +9007199254740991. (That's -(2^53-1) to +(2^53-1).)

Those big numbers might look as big as the other ones at first glance, so let's line 'em up and compare them:

Marketo Integer Max:  9,223,372,036,854,775,807
SFDC Double Max:          9,007,199,254,740,991
                      -------------------------
                      9,214,364,837,600,034,816              

It may seem crazy to care, but the Marketo side can hold 1000 times the unique values as the SFDC side. There are 9 quintillion valid values in Marketo that can't be represented in SFDC.

As above, if you can make the larger side (Marketo in this case) guarantee that it won't use more than the common set of 9,007,199,254,740,991 values, then you'll be safe. A one-by-one auto-incrementing field will take millennia to run out of SFDC-compatible values, so that common case is no problem.

But on the other hand, if you have an app that randomly generates a 64-bit integer and inserts it into Marketo, there's only a 1 in 1000 chance that it will fit in SFDC!

Another area where “You've got 1000s of years, don't sweat it” falls apart is — perhaps ironically — when you're dealing with very granular timestamps. There are 31 quadrillion nanoseconds in just one year (remember, the SFDC/53-bit max is “only” around 9 quadrillion), so if by some chance you're using a time-series database that stamps events down to the nanosecond (we use one such product, InfluxDB) then you could log just one event in May and be over the limit!

Unfortunately, much as I hoped for a triumphant “Here's how it's done” section in this post, I don't think there's an true technical solution to this problem. The only solution is procedural, meaning you have to manage your application code to never generate numbers beyond the 53-bit (SFDC-side) range. So you should add this rule to your documentation pronto (and ask your vendors about it, though they'll probably get defensive).

What about other apps/languages/contexts?

They're all over the place. :)

If you output a {{lead.token}} into a web page, you can manipulate it as a JS Number, for example:

var contractId = {{lead.Contract ID}};

JavaScript Numbers (there is no separate integer type in JS) are akin to SFDC doubles, i.e. their safe range is the 53-bit range only. The good news is that in JS/HTML, you may only be displaying the value and not trying to run it through any math stuff. In such a case, you can just output it as a string

var contractId = "{{lead.Contract ID}}";

But if there's some need to work on the value as number, you could face the same danger. If you output a number outside the range you won't be able to even add or subtract from it safely:

Look closely at those numbers! That's how crazy things get with integer arithmetic outside of the safe range.

On the other hand, in Java (as opposed to JavaScript) you won't have a problem, as long as you run a 64-bit JVM and long. And you'll be fine in .NET with a 64-bit runtime and Int64.

So there's no hard-and-fast rule across technologies. You need due diligence to ensure there are no implementation-specific limits that could cause data loss.

Summing up

As long as all your connected apps are built to enforce the lower limit, they'll have plenty of numbers to play with (a range of ±9 quadrillion, as opposed to ±9 quintillion, is indeed fine for most real-world apps).

Yet I'm not merely being pedantic, since the real-world Integer limit can be either an infinitesimal fraction (if you're syncing to 32-bit integers) or a small fraction (if 53-bit floats) of Marketo's published limit.


Notes

[1] Purposely choosing a link that doesn't overly politicize the parable: I always thought it was a “Math is cool” tale and nothing more.

[2] Don't know about Dynamics CRM. Maybe Dan S. can chime in.

[3] Sort-of-funny flip of this tradition: I'd long thought I was the 500th ever member of Crunch Gym in New York (Member ID #500, and I signed up opening week) but found out much later that they had started numbering at 100,001. It was only after someone realized they were wasting 100,000 IDs that way (that they might need after they got really big) that they restarted numbering from 500, so I was definitely not as cool as I thought. :)

[4] Safe meaning “contiguous and unambiguous.“ You can ask me what that means in the comments if you want.