Down the Rabbit Hole: Querying a Domain; or What Happens When You Enter “holbertonschool.com” Into Your Browser

Justin Majetich
7 min readApr 13, 2020

For all of the tens of thousands of hours I’ve spent on the internet, I’ve only recently started to understand what’s happening when I punch in a web address and almost immediately the corresponding page appears in my browser. My fascination with this massive, convoluted network of wire and information has only grown with greater understanding. More than ever, the internet feels at once perilously intricate and larger than life. So what really happens when you type a web address into the browser — for the purposes of this article: “https://www.holbertonschool.com” — and smash that enter key?

Where To Begin…

The address we are querying today can be broken into a few different parts. At the heart of a web address is the domain name. In our case, this is “holbertonschool.com”. We also have the subdomain (“www.”) and the communication protocol (“https://”). This latter component is not actually part of the address; its purpose will be discussed later.

Put simply, a website is nothing more than a file hosted and served to you from a computer, and a web address just specifies which computer this is. So, when we enter the address “www.holbertonschool.com”, the first thing our browser does is try to determine just where in the world the files that make up this website are located. To do this, the browser must determine the the host devices IP address.

Behind every web address is an IP number. Every device connected to the internet has one of these unique, numerical identifiers. The IP of the laptop I’m writing this article from looks like this, for example: 70.19.81.177. Assigning domain names to these numbers (such as “facebook.com”, “pbs.org”) make these addresses much easier for humans to remember. And so, for our search, the browser must first lookup the IP address associated with the domain “www.holbertonschool.com”. To do this, it will employ the Domain Name System — or DNS.

Domain Name System

Before technically entering the DNS, the browser will reference is its local cache. When a browser communicates with a website, it stores the domain and associated IP address in this cache for future reference. So, if our browser has recently communicated with “www.holbertonschool.com”, we can immediately direct our request to the appropriate IP. If the domain is not tracked in the local cache, the next stop is the internet service provider’s cache, which functions similarly, just at a larger scale.

If the domain is still not matched to an IP, our browser will search in the DNS. A root name server is the first layer of this system. Interestingly, there are only thirteen of these servers scattered across the globe — though each server is technically a cluster of many identical instances. These authoritative name servers examine the top-level domain of the requested address and direct traffic to an authoritative name servers accordingly. In our case, the top-level domain of our web address is “.com”, and we are directed to the “.com” authoritative name server.

The name server will search its records for the domain “holbertonschool” with the subdomain “www”. If the address is found, the server will return an A record, which is a record that maps a domain or subdomain directly to an IP. In this case, the IP which is associated with “www.holbertonschool.com” and which is returned to our browser is 99.84.32.15. Now that we have the IP address of the server which is hosting “www.holbertonschool.com”, we are ready to make a request to this server for the files which our browser can then render as a webpage. (I did a bit of digging and found the the geographical location of the server serving this site. Turns out it’s located at these coordinates — 47°32'31.2"N 122°18'44.3"W — which according to Google is on the property of the King County International Airport in Seattle, Washington.)

Internet Protocol

There are different protocols which define different standards of communication across the internet. One of the primary ones implemented today is TCP/IP, or Transmission Control Protocol/Internet Protocol. Some of the defining features of this protocol are connection and reliability. Under TCP, a persistent connection is established between host and client before a transfer of data is made. Once a connection is confirmed, data can be transmitted, and for every packet of data that’s sent from server to client, a verification of receipt is made. This error-checking implement ensures that transmissions are complete. Though TCP can result in greater latency than other protocols such as UDP, it is useful for interactions in which the completeness and integrity of transacted data is important — i.e. a file or webpage download. This is the sort of connection we’ll be establishing with the server hosting “www.holbertonschool.com”, and now that we have its IP address, we can do just that.

Securing the Connection with HTTPS

When we entered the address “www.holbertonschool.com”, we prefixed it with “https://”. This denotes that the connection to be made should be a secure one. Layered on top of TCP/IP is another fundamental internet communication protocol: HTTP and its secure extension HTTPS. HTTPS — or Hypertext Transport Protocol Secure — dictates a secure transfer of information using SSL (Secure Sockets Layer) encryption. SSL is yet another protocol which standardizes the method of encryption. Before a client and server can establish an HTTPS connection, they must undergo a validation procedure, commonly referred to as a “handshake”. Put simply, this handshake looks like the client and server sharing a unique cipher with one another which they can then use to encrypt and decrypt the extent of their following communications. While our business with “holbertonschool.com” might not be particularly sensitive, encrypted communication is critical for things like online bank transactions or the sharing of a social security number.

Load Balancer

When we make a request to “www.holbertonschool.com”, the first component of the host server’s architecture we hit will be a load balancer. The load balancer will first present an SSL certificate, launching the HTTPS validation process explained above. It will also serve as an endpoint for encryption; meaning, everything passed in and out of the host server will be respectively decrypted and encrypted at this juncture.

However, security is not the primary function of load balancing software. As the name suggests, the load balancer’s job is to delegate client requests evenly across the web servers at this IP address. This is critical behavior for any website with moderate to high traffic. For example, a website like “facebook.com” might employ thousands of largely identical web servers, each the files necessary to load the website in your browser. The load balancer makes sure that each of these web server instances receive an even number of requests, as one web server would be immediately overwhelmed by the traffic a site like Facebook receives.

Firewall

Now that our connection to “holbertonschool.com” has been secured and our request delegated to an available web server, we must pass through that server’s firewall. A firewall wall is a security mechanism which regulates traffic in an out of a server. While SSL is concerned with protecting the content of a communication itself, firewalls make sure the host is only communicated with trusted entities in the first place. It does this by managing the opening and closing of ports, as well as validating the service or client attempting to access an opened port. Luckily, we are an unsuspicious party, and “holbertonschool.com”’s firewall grants us access to the web server.

Web Server

Finally, we’ve made connection with the software which will serve us the webpage we’ve requested. A web server is the component of server architecture tasked with actually relaying static web pages to the client. As I mentioned above, often times many identical web servers will sit behind a load balancer serving the same content to different client requests. More often than not, HTML files are being served. If the webpage we’ve requested is dynamic, the web server will need some help generating a static page for delivery.

Application Server

An application server runs any programs, applications or codebases needed to serve the client a webpage. Say you’re querying a file conversion site to convert a .jpeg into a .png. The web server will take your request — in this case the file to convert — and pass it on to an application server which will run a program to perform the actual conversion. When the conversion has been made, the converted .png will be returned to the web page, which will in turn serve it to the client. The application server could also be used to populate an HTML file with dynamic content stored on an adjacent database server.

Database

Rarely does a web server store an HTML page to be served as is. Often, these HTML pages are templates for dynamic content. For example, when “facebook.com” loads in my browser, I see a timeline full of text and images shared by “friends” on the site. I guarantee your “facebook.com” looks very different, though the form is the same. We’re both receiving the same page populated with distinct content. When we make the request for this page, an application server might query a database to find the most recent posts of our friends which have been stored within. Or say, for example, you arrive at a website to which you’re already logged in. The the page appears, your name is listed at the top right, along with a greeting. This is obviously specific to you. The server has a record of your IP stored in a database along with your name and all sort of other information which is relevant to your unique interaction with this webpage.

Voila

At last! our “https://www.holbertonschool.com” has materialized in our browser. Mind you, we’ve barely had time to lift our finger from the “enter” key, and still our request has traveled thousands of miles and interfaced with a number of different entities. The fact that all this goes of without a hitch billions of times everyday around the world is terrifying and awe-inspiring. Hopefully, next time you smash that like button, you do so with a little more appreciation for the underlying system.

Chart of an internet interaction similar to that explored in this post. Created by author.

--

--