Call it the Law of Large Projects: eventually, someone’ll receive data they don’t have enough experience to validate, give a shrug, and assume it must be right.
APIs, awesome as they are, are one cause. Developers might fetch data from the REST API, but without really knowing how to use Marketo itself (they might never have logged into an instance!) they can’t really understand the results.
And as data gets further transmogrified by internal processes, it can end up with stuff that would make a Marketo user instantly say That ain’t right! while devs are happily logging LGTM.
Such was the case when the Annual Revenue field showed both positive and negative numbers in a client’s in-house BI tool, whose underlying SQL database gets populated via the Bulk Extract API.
And there were 2 different reasons, as it turned out. The 1st was familiar, but hadn’t seen the 2nd before and it’s likely new to you as well.
Reason 1: Using the wrong field width, then converting its unsigned version to signed
Seeing Annual Revenue with values like -294967296
set my datatype antennae (yes, I have those!) tingling. Sure, negative revenue is a thing in the accounting world, but we’d never store that kind of value in Marketo.
Non-martech-savvy but otherwise technical people wouldn’t necessarily know Annual Revenue < 0 must be a mistake. After all, in standard SQL, all number types are signed, so they can be negative, positive, or zero. Similarly, in most programming languages, if there’s a default numeric type it’s signed; otherwise, you choose between signed or unsigned.[1]
But we know from experience that Marketo Engage wouldn’t have negative values. So something was wrong with how the data was being handled.
First move: look up the email address in Marketo. We got one match, with Annual Revenue $4,000,000,000.00
. So what’s the relationship between 4 billion and -294967296
?
Answer: a negative integer is what you get if you round the float to a whole number, then store it in an unsigned integer column that’s just wide enough for the positive value. Then later copy the value to a signed integer of the same width.
A tiny C# demo:
uint originalValue = 4000000000;
int valueAsSigned = (int)originalValue;
Console.WriteLine(valueAsSigned); // prints "-294967296"
4 billion fits in an unsigned integer (as it’s between 0 and 4,294,967,295). But it doesn’t fit in a signed integer as-is (as it’s greater than 2,147,483,647). So — assuming your app doesn’t give you a warning about data loss[2] — due to two’s complement encoding you end up “wrapping around” to the negative side.
Now we know the direct cause: someone put a valid unsigned 32-bit value in a signed 32-bit variable, and ended up with a technically error-free, but very wrong, result.
But the root cause is key: the developers didn’t understand they needed a wider container to hold the Marketo data in the first place. A 64-bit integer, a.k.a. long
/Int64
/BIGINT
, either unsigned or signed, could hold any feasible rounded revenue and you need not worry about data loss.
Reason 2: Excel
We also noticed that many records had the specific Annual Revenue -2146826246
.
Hard to see how a datatype/width issue could cause the same value for multiple records. Indeed it was something very different: that’s an Excel-specific error code for the #N/A
value!
At some point, the data had been pulled into an Excel sheet, then read programmatically, i.e. not via the Excel UI but some kind of data transformation app. The #N/A
columns — where Annual Revenue was null — were populated with that magic negative number, then data was reimported. Interesting, eh?
The lesson here is don’t use Excel in/as your integration stack.
Notes
[1] If you’re not familiar, unsigned values are 0 or positive; signed values are negative, 0, or positive.
[2] Yes, this is data loss, because you cannot independently know if the negative number is supposed to be negative or was the result of a destructive conversion.