Using HttpWebRequest | Heaton Research

Using HttpWebRequest

    The HttpWebRequest class is a child class of the WebRequest class. For HTTP connections you can use either class. However, some of the HTTP options are only available with the HttpWebRequest. Therefore, most of the examples in this book use the HttpWebRequest class.

    One of the main options available through the HttpWebRequest class is the ability to set request headers. Request headers are sent by the web browser to the web server to provide additional information to the web server. These headers are commonly used for the following purposes:

  • Identifying the type of web browser
  • Transmitting any cookies
  • Facilitating HTTP authentication

    There are other things that can be accomplished with HTTP request headers; however, these are the most common. HTTP authentication will be explained in Chapter 5, “Secure HTTP Requests,” and cookies will be explained in Chapter 8, “Handling Sessions and Cookies”. Setting the type of browser will be covered later in this section.

Setting HTTP Request Headers

    The Headers property of the HttpWebRequest class provides several functions and methods that can be used to access HTTP request headers. These functions and methods are shown in Table 4.1.

Table 4.1: HTTP Request Header Methods and Functions

Method or Function Name Purpose
Set (String key, String value) Set the header to the specified value. If a header named that already exists, it is overwritten..
Add (String key, String value) Adds the specified header. If there is already a header named this, then a second is created.
Keys Returns a list of all HTTP headers.

    Usually the only method from the above list that you will use will be the Set method. The others are useful when you need to query what values have already been set. If there is already a header with the specified name, Set will overwrite it. Add can be used to add more than one of the same request headers with the same name. Usually, you do not want to do this. Adding more than one header of the same name is useful when dealing with cookies - which are discussed in Chapter 8, “Handling Sessions and Cookies”.

Identifying the Browser Type

    One of the HTTP request headers identifies the browser type that the user is using. Many web sites take this header into account. For example, some web sites are only designed to work with certain versions of Microsoft Internet Explorer. To make use of such sites, you need to change how HttpWebRequest reports the browser type.

    The browser type can be determined from the user-agent HTTP request header. You can easily set the value of this, or any, HTTP request header using the set method of the Headers collection. For example, to identify the bot as a browser of type “My Bot”, use the following command:

http.Headers.Set("user-agent", "My Bot");

    The user-agent header is often used to identify the bot. For example, each of the major search engines use spiders to find pages for their search engines. These search engine companies use user-agent headers to identify them as a search engine spider, and not a human user.

    When you write a bot of your own, you have some decisions to make with the user-agent header. You can either identify the bot, as seen above, or you can emulate one of the common browsers. If a web site requires a version of Internet Explorer, you will have to emulate Internet Explorer.

    Table 4.2 shows the header used by most major browsers to identify them. As you can see, this header also communicates what operating system the user is running as well.

Table 4.2: Identities of Several Major Browsers

Browser User-Agent Header
FireFox 1.5 Mozilla/5.0(PC) (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Internet Explorer 6.0 Mozilla/4.0(PC) (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Safari v2 (Mac) Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418.8 (KHTML, like Gecko) Safari/419.3
Firefox v1.5 (Mac) Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Internet Explorer 5.1 (Mac) Mozilla/4.0 (compatible; MSIE 5.14; Mac_PowerPC)
Java(PC/Mac) Java/1.5.0_06
C# By default no user-agent is provided.

    You will also notice from the above list, I have C# and Java listed as browsers. This is what Java will report to a web site, if you do not override the user-agent. It is usually better to override this value with something else. C# provides no user agent by default.

Copyright 2005-2009 by Heaton Research, Inc.