Unravelling the Web – Part 2

In part one we saw that just as a letter moves through the postal system (or imagine successive rounds of Six Degrees of Separation) your message to my website moves one router at a time, getting ever closer to its destination. It’s now hit a router that recognises the IP address as “local”. Under our postal analogy, the letter has reached a delivery office. Let’s continue with the postal analogy to understand what happens next.

At a delivery office, letters are sorted into a sensible order for the postmen and women to pick up and deliver. But if you think about it, a postcode doesn’t tell the postman exactly where in the physical world the letter needs to go. To answer that we return to the full address, such as 38 Princess Avenue, Rainham. Even then, what does an address really tell you? It’s meaningless without prior local knowledge. A postwoman must be trained in her round, and what that involves is mapping postal addresses to physical locations.

The ultimate definition of location is longitude and latitude. These two coordinates pinpoint exactly where something is on a global scale. We don’t deal with these directly on a day-to-day basis because experience, maps and satnavs hide that detail from us. But we can think of the longitude and latitude of a place as another type of address – a physical address. So for a letter at a delivery office, the next step is translating a postal address into a physical address. A postman may not be conscious of it but he is doing that translation as he goes about his round.

figure7Figure 7: a complete journey for a letter in the postal system.

Back to your message for my website, which is sitting with that last router. Just like with the mail at the delivery office, where the postcode has lost its usefulness, so the IP address has done everything it can. We need to turn the IP address into a different address. Now that we’re talking about computers, obviously this second type of address isn’t a physical location but, just like the longitude and latitude system, the address is in fact globally unique. It’s called a “MAC address” and it’s a unique number burnt into the circuitry of the network device. MAC addresses are what computers in close proximity use to talk to each other. This is because at a local level computers speak a language that doesn’t use IP addresses – just as a postwoman on her delivery round doesn’t read the postcodes.

Let’s take a look at that last router, which has a message for my website at 176.74.20.5 (as I said in part one, this address may change with time). The way that IP address is translated into a MAC address is not especially elegant. The router sends a message to every device on the local network as itself asking “who has the IP address 176.74.20.5?”. It’s effectively shouting out this question, but it needn’t shout beyond the routers surrounding it because it knows the answer is nearby. The machine which has that IP address answers back, “yup, that’s me” and in doing so provides its MAC address. MAC addresses look nothing like IP addresses. An example might be 00:50:56:A2:DE:17 but there’s no need for us to delve deeper than that. Just seeing one is enough to appreciate that MAC addresses do a very different job to IP addresses. In fact, it’s because the question “who has the IP address 176.74.20.5?” uses MAC addresses, not IP addresses, that the message stays local. In the picture below the message comes in on Wire 14, the router compares 176.74.20 with the start of the destination address 176.74.20.5 (outward sorting), realises the destination is local and knows to direct the question about the MAC address down Wire 11.

figure-8Figure 8: the final router’s decision.
Having worked out that the destination IP address is within reach, it begins the process of translating that address to a MAC address. There may be lots of machines reachable down Wire 11.

When the router gets the MAC address for the target machine, it uses that to send the message to its final destination. In the postal system it’s the physical address that gets the letter to the right house. Similarly, it’s the MAC address that gets the message to the right computer. In point of fact, this isn’t the first time MAC addresses have been used. Far from it. They’ve been used at every hop along the path across the internet. In contrast to the postal system when a postal address is converted to a physical address only at the end, IP addresses are converted to MAC addresses at every hop. For example, let’s rewind right the way back to when your computer sent a message to your broadband router:

figure-9Figure 9: an embellished version of figure 6 from part one with MAC address look-ups added.
Q1 is the first MAC address question, asked by your computer, with the answer given by your router in A1. Then the router asks Q2 and A2 comes back in reply.

Using its routing table, your computer knew that my website was somewhere out on the internet and the only way of getting there was to go via your broadband router. Your computer knew the IP address of the router (192.168.1.1) so – and this is a step we missed out at the time – it asked for the MAC address by shouting the question to every device that would listen. Your router answered back with something like F0:1F:AF:D3:EF:AC, which your computer used to talk to it directly. Next, your router figured out that 213.1.112.248 was the best IP address to send this message on to. When it asked for a MAC address, a router at your ISP answered 00:01:63:42:FD:A1, which the router used to talk to it directly. And so on. For every hop along the way, it’s these MAC addresses that are used for actually shuttling the message around but it is the IP address that is used to make the decision about where to send the message to. That concept, summed up in just one sentence, is fundamental to how computer networks operate.

Your message has reached its destination. This machine, which hosts my website, is called a “server” because it serves up something useful (I hope). Because it serves up web pages, it’s called a web server. A server that handles email is called a mail server. A server that stores files is called a file server. You get the idea. The web server “opens the envelope” and finds a request for the home page of my site. So it returns a web page, which is just another message, this time addressed to your computer. The web server can do this because the details of who sent the incoming message accompanied the message itself, just like some letters have the sender’s address on the back. The journey back to your computer uses exactly the same mechanism that got the message to my website in the first place.

In summary, part two has shown that the message to get the home page of my website traverses the internet in not one but two “envelopes”. The inner envelope has the destination IP address 176.74.20.5, which is static throughout the journey. The outer envelope uses a MAC address to identify the destination of the next hop. This changes as the message bounces from router to router. Since a picture is supposed to paint a thousand words, I’ll try this:

figure10Figure 10: the route a message takes over the internet (several intermediary steps have been compacted inside the cloud).
Once a router has used the destination IP address to work out the next best hop, the outer envelope is rewritten with a new MAC address to get it there.

In Figure 10, notice how the destination IP address of the inner envelope (as well as the message it contains) remains the same, whereas the destination MAC address of the outer envelope changes with every hop. You can imagine the outer envelope being opened and discarded by every router on receipt, and then the inner envelope being re-wrapped with a newly addressed outer envelope before it’s sent on its way. Only when the destination IP address is reached is the inner envelope opened to reveal the request “I’d like your home page”.

I think this second part is harder to grasp than the first part so I congratulate you for making it to this paragraph! It was always going to be a tall order to make the workings of the internet accessible to a non-technical audience but I hope that you’ve come away with a better understanding of the computer that’s sitting in front of you right now.

Thanks to Freethought Internet for chatting to me about their infrastructure.