Code Anatomy: Convert a dotted IP address to decimal

Working up a GDPR/CASL-related post on modifying Marketo forms based on inferred country — which means checking if a decimal (numeric) IP address is in a known range.[1]

But first, let's take a little detour: you'll learn that what you think of as an “IP address” is probably not-quite-right, and see how JavaScript's Array#reduce and Array#reverse can help get this done.

Myth shattered: IPs aren't really dotted strings

You might not realize that an IP address[2] is actually a regular old 32-bit whole number, not a dotted string like "200.100.80.99". And you don't merely remove the dots to get the underlying number, you have to do a base conversion (warning: math lies below).

The dotted format with 4 small numbers is (ostensibly) easier to read and remember, but computers always transform it into a single number before using it for communications. The popular “user-friendly” format has probably made the technical concept harder to understand.[3]

When the IP standard was formalized back in 1981 (you whippersnappers aren't technically the Internet generation, hate to break it to ya!) there was no mention of a dotted representation, just of 32 unsigned bits.[4]

Thus, as you might already know from news coverage, there is an absolute, incontrovertible maximum of 4,294,967,296 (232) unique IP addresses.

That's a magic number in computing in general, but has nothing to do with IP addresses in particular. There's also an absolute maximum of 4,294,967,296 rows in a database using an unsigned INT as primary key; a maximum of 4,294,967,296 bytes of memory per app in some old operating systems; a max file size of 4,294,967,296 bytes for some apps and/or filesytems. Point being, that particular limit always comes from the same place: using a 32-bit number as your way of counting stuff!

So what's a dotted IP address? It's a Base-256 representation of a 32-bit decimal (Base-10) number.

Each of the 4 so-called octets in a dotted IP address is a Base-256 digit (or, less confusingly, let's say a Base-256 position as the word “digit” comes from 10 fingers and connotes Base-10!).

That is, just as 1234 in Base-10 is calculated via

1 × 103 + 2 × 102 + 3 × 101 + 4 × 100

then "1.2.3.4" in Base-256 means

1 × 2563 + 2 × 2562 + 3 × 2561 + 4 × 2560

or if you calc it out, 16909060.

Which brings us at last to the JS part of the post.

You'll see most people suggest a function like this:

function dottedToDecimal(dotted){
  var positions = dotted.split(".");
  return (
    positions[0] * Math.pow(256,3) + 
    positions[1] * Math.pow(256,2) + 
    positions[2] * Math.pow(256,1) + 
    positions[3] * Math.pow(256,0)
 );
}

Now let me be clear: this wouldn't really be bad code.

But I don't like its sensitivity to positions 0 through 3 — see how the numbers have to go up on the right and down on the left? Sure, you only need to type it once, but it looks too magical when it's a simple mathematical process.

It's like when you see this kind of thing:

var secondsPerDay = 24 * 60 * 60;

Those numbers are very common (to Earth-dwellers) but they're still unnecessarily magical. I prefer to do this:

var secondsPerDay = hoursPerDay * minutesPerHour * secondsPerMinute;

More important, for the last year-ish, I've been near-religiously relying on collection-centric logic (forEach-ing/filter-ing/map-ing/reduce-ing Arrays or Object keys). My first instinct is now, "OK, where's the collection?" I find the results more maintainable ~95% of the time.

So let's use native JS Array methods instead, avoiding repetition and magic numbers.

At core, we need a method that can take an array of Base-256 positions and do the exponent stuff automatically based on the array index.

Better yet, why not have it accept any base, not just Base-256, since base notation always works the same way, whether it be Base-2 or Base-999.

So here's a simple one:

function baseNto10(positions,base) {    
    return positions
      .reverse()
      .reduce(function(acc,position,idx){
        return acc + position * Math.pow(base,idx);
      }, 0);  
}

That'll convert from any any base to decimal, regardless of the problem domain (that is, it doesn't know, nor need to know, that it's going to be used for IP addresses in this case).

Then we can tack on the ultra-short function that's specific to the IP address domain:

function dottedIPToDecimal(dotted){
  var positions = dotted.split(".");
  return baseNto10(positions, 256);
}

Now, you can run dottedIPToDecimal("1.2.3.4") which will split the string into an array of strings, then pass off to baseNto10, telling it to use Base-256. Presto! Proper separation of generic/specific code and reliable results.

How baseNto10 works

I did get a little tricky (as I often do, while acting like the code just wrote itself). The catch in turning "1.2.3.4" into an array of positions like

["1","2","3","4"]

is that the 1 is at array index 0, but that's the one that needs to be multiplied by Math.pow(base,3) (that is, in this case, multiplied by 2563). And of course the 2 is at array index 1 and that one needs to be multiplied by 2562, etc.

So the relationship between the multiplier and the index is:

multiplier = array.length - 1 - index; 
// 4 - 1 - 0 = 3, 4 - 1 - 1 = 2, etc.

But we don't want to do anything that magical in the code.

Remember (c'mon, it was surely longer ago for me than for you): addition is commutative, meaning 1 + 2 + 3 is the same as 3 + 2 + 1. And exponentiation is always done before multiplication, multiplication before addition.

So

1 × 2563 + 2 × 2562 + 3 × 2561 + 4 × 2560

is the same as

4 × 2560 + 3 × 2561 + 2 × 2562 + 1 × 2563

This is actually a more correct way of expressing the base conversion, since you always have to start from the right side in order to count positions. (As with everyday decimal numbers you think of the 1s position, the 10s position, the 100s position, etc. from right to left.)

Because reversing the order of the positions is not only safe but correct, we can call reverse() on the array, changing it from ["1","2","3","4"] to ["4","3","2","1"].

Now, we don't have to do any crazy stuff with the length, since array index 0 is also the one to multiply by 2560, array index 1 gets ×'d by 2561, and so on.

That alignment simplifies the next and last step, where we use a call to the all-powerful Array#reduce to simplify the array to a single number, the decimal IP.

An endolude about reduce

The name “reduce” is a bit confusing if you don't already know the technical term. It describes the most common use of the function: to take a complex variable (an Array or other collection of values) and, by reading and processing all of its values, reduce it to a simple aggregate value (Number, String, maybe Boolean). So it doesn't mean a reduction in terms of quantity represented, but a reduction in, let's say, dimensions.

Even so, reduce doesn't need to have a simplified result (although in this post I am using it in the typical fashion). It's perfectly capable of returning an Object or another Array (and sometimes I use it for just that). Either way, you should put it to use!


Notes

[1] You still need to let the end user confirm their legal jurisdiction. Even if IP-based geolocation were a 100% exact science, a person's physical location doesn't always predict their effective statute(s).

[2] Meaning IPv4, of course: IPv6 is off-topic today. And of course reserved private IPs are duplicated within countless organizations. But that doesn't change the fact that there are 232 unique IPs that could ever be seen in the same room.

[3] I don't even know if 3362017379 is harder to remember than "200.100.80.99". From a spoken communications perspective, the dots add a natural readability, admittedly.

[4] Other contemporaneous RFCs did mention the format, like RFC 780 from a few months earlier:

Another form is four small decimal integers separated by dots and enclosed by brackets, e.g., "[123.255.37.321]", which indicates a 32 bit ARPA Internet Address in four eight bit fields.