In the last section HTTP requests generated during a typical surfing session were examined. Now these requests will be examined in detail. First, the different types of requests will be discussed. There are a total of three standard HTTP requests that are commonly used:

  • GET
  • POST
  • HEAD

    The GET request is the most common request. Any time the user enters a URL into a web browser, a GET request is issued. Additionally, each hyperlink followed, or image downloaded is also a GET request.

    The POST request is usually the response to an HTML form. Whenever a form is filled out and submitted, a POST request is being issued.

    The HEAD request is rarely used. It allows only the HTTP headers to be requested. The actual contents of the “file” requested will not be sent. A web browser will not generate the HEAD request; however, some search engines make use of it to determine if a URL is still valid. Because the HEAD request is not generated by a web browser and is of little real use to a bot, this book will not discuss it further.

    The response from the web server, to the GET and POST requests is the same. In both cases, the response will be an HTML file, image, or some other form of data. What is returned depends on what the web server is programmed to return for the request it has received. The usual response to a POST will be a HTML page that displays the result of the form. For example, the response to a POST from an order form might be a HTML page that contains the user’s order number.

    In the next few sections how the GET and POST requests work will be explained in greater detail.

GET Requests

    Choosing between GET and POST usually comes down to how much data must pass to the web site. GET allows only a limited amount of data to be passed to the web server. POST allows a nearly infinite amount of data to be passed to the web server. However, if you are writing an HTTP program to communicate with an existing web site, the choice is not yours. You must conform to what that site expects. Therefore, most HTTP applications will need to support a mix of GET and POST requests.

    The GET request is good for when little, or no, additional information must be sent to the web server with a request. For example, the following URL, if sent with a GET request, will pass no data to the web server.

http://www.httprecipes.com/1/test.php

    The above URL simply requests the test.php page and does not pass any arguments on to the page. However, several arguments may need to be passed. What if the bot needed to pass two arguments named “first” and “last”? The following URL would do this:

http://www.httprecipes.com/1/test.php?first=Jeff&last=Heaton

    This would pass two arguments to the test.php page. As can be seen, passing arguments with a GET request requires them to be appended onto the URL. The question mark (?) indicates that the arguments have started. Each argument is the name of the argument, followed by an equal sign (=), followed by the value of the argument. Each argument is separated from the other arguments using an ampersand (&) symbol.

    If there are a large number of arguments to pass, GET can be cumbersome. In such cases, the POST request should be considered. Of course, as previously stated, if using an existing web site, the bot must conform to what request type is already being used.

    Figure 1.4 shows the results of the above URL.

Figure 1.4: Result of GET Request

Result of GET Request

    As can be seen, the two arguments showed up as URL arguments in the URL address input area.

POST Requests

    GET request are limited by size. All arguments must fit on the URL. POST requests have no such limitation. This is possible because the data that sent with a POST request will be transmitted separately from the URL.

    To use an HTTP post, there will usually be an HTML page with a form. Figure 1.5 shows such a form.

Figure 1.5: An HTML Form

An HTML Form

    The arguments used by the form will be specified in the HTML. Listing 1.2 shows the HTML that was used to produce Figure 1.5.

Listing 1.2: The HTML Form

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<HTML>
<HEAD>
	<TITLE>HTTP Recipes</TITLE>
	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
	<meta http-equiv="Cache-Control" content="no-cache">
</HEAD>

<BODY>

<table border="0"><tr><td>
<a href="http://www.httprecipes.com/">
<img src="/images/logo.gif" alt="Heaton Research Logo" border="0"></a>

</td><td valign="top">Heaton Research, Inc.<br>
HTTP Recipes Test Site
</td></tr>
</table>
<hr><p><small>[<a href="/">Home</a>:<a href="/1/">First Edition</a>]</small></p>
<table border="0">
<form method="post" action="/1/test.php">
<tr><td><b>First Name:</b></td><td><input name="first"></td></tr>

<tr><td><b>Last Name:</b></td><td><input name="last"></td></tr>
<tr><td colspan="2"><input type="submit" value="OK"></td></tr>
</form>
</table>

<hr>
<p>Copyright 2006 by <a href="http://www.heatonresearch.com/">
Heaton Research, Inc.</a></p>
</BODY>
</HTML>

    As can be seen from the above form, there are two <input> tags that both accept text from the user. These will be picked up as posted variables when the POST request is sent to the web server. The result of the POST request is shown in Figure 1.6.

Figure 1.6: Result of the POST Request

Result of the POST Request

    Notice how Figure 1.6 is different than Figure 1.5? The arguments are displayed as POST arguments, rather than URL arguments. Additionally, the request type is POST.

    So far, only the data passed in HTTP requests and responses has been examined. There are also headers that contain useful information. The headers will be examined in the next section.


Copyright 2005 - 2010 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright and trademark information.