The Structure of Surfing
As a user uses the web browser there is considerable network activity occurring to support the browsing experience. The Hyper Text Transport Protocol (HTTP) is what allows this to happen. HTTP specifies how web browsers and web servers manage the flurry of requests and responses that occur while a web user is surfing the web. Once it understood how web browsers and servers communicate, the built in HTTP classes provided by C#, can be used to obtain information from a web server programmatically.
If you already understand the structure of HTTP requests between web servers and web browsers, you may be able to skip this chapter and proceed directly to Chapter 2, “Analyzing Sites”, or Chapter 3, “Simple HTTP Requests”. Chapter 2 expands on Chapter 1 by showing how to use a “network analyzer” to examine, first hand, the information exchanged between a web server and web browser. A network analyzer can be very valuable when attempting to program a bot to access a very complex web site. However, if you are already familiar with using network analyzers, you may proceed directly to Chapter 3, which begins with C# HTTP programming.
The first thing to understand about web browsing is that it is made up of a series of HTTP requests and responses. The web browser sends a request to the server, and the server responds. This is a one sided communication. The opposite never occurs. The web server never requests something of the web browser.
For a typical surfing session, the HTTP protocol begins when the browser requests the first page from a web server. It continues as additional pages from that site are requested. To see how this works, the next section examines the requests that are sent between the web server and web browser.




