Server Headers | Heaton Research

Server Headers

    The server headers contain many useful pieces of information. Server headers are commonly used for:

  • Determining the type of data at a URL
  • Determining the cookies in use
  • Determining the web server software in use
  • Determining the size of the content at this URL

    For the bots that you create, you will most commonly use server headers to determine the type of data at a URL and to support cookies.

Reading Server Headers

    Once you retrieve the contents of a URL back from the server, there are headers available to the program. The web server provided this second set of headers. The HttpURLConnection class provides several functions and methods to access these server response headers. These functions and methods are listed in Table 4.3.

Table 4.3: HTTP Response Header Methods and Functions

Method or Function Name Purpose
getHeaderField() Returns the value for the nth header field.
getHeaderField(String name) Returns the value of the named header field.
getHeaderFieldDate(String name, long Default) Returns the value of the named field parsed as date.
getHeaderFieldInt(String name, int Default) Returns the value of the named field parsed as a number.
getHeaderFieldKey(int n) Returns the key for the nth header field.
getHeaderFields() Returns an unmodifiable Map of the header fields.

    As you can see from the above headers, you can read the headers in a variety of formats. To read headers as a String, use the getHeaderField function. To read headers as an int, use the getHeaderFieldInt function. To read the headers as a Date, use the getHeaderFieldDate function.

MIME Types

    One very important HTTP response header is the content type. The content-type header tells the web browser what type of data the URL is attached to. For example, to determine the type of content at a URL, you would use the following line of code:

String type = http.getHeaderField("content-type");

    This type information is called a Multipurpose Internet Mail Extensions (MIME) type. The “Mail” in MIME is largely historical. MIME types were originally developed for email attachments, long before there was a World Wide Web (WWW). However, they are now applied to many different Internet applications, such as web browsers and servers.

    A MIME type consists of two identifiers separated by a slash (/). For example text/html is a mime type that identifies a resource as an HTML document. The first part of the type, in this case text, identifies the family of the type. The second identifies the exact type, within that family. Plain text files are also part of the text family, and have the type text/plain.

Table 4.4: MIME Families

Method/Function Name Purpose
application Application, or raw binary data
audio Sounds and music
example Used only for example types
image Image.
message Mail messages
model Compound type document
multipart Another compound type documents
text Text formats
video Video formats

    There are many different MIME types under each of these families. However, there is only a handful that you will commonly see. Table 4.5 summarizes these.

Table 4.5: Common MIME Types

MIME Type Purpose
image/gif GIF image files.
image/jpeg JPEG image files.
image/png PNG image files.
image/tiff TIFF image files.
text/html HTML text files.
text/plain Unformatted text files.

    Often, the program will only need to look at the family. For example, if you wanted to download text and binary files differently, you would simply look at the family part of the MIME type. If it is determined that text is the family, you may download the URL as a text file. Any other family would require downloading the information as a binary file. The difference between a binary file and a text file is that binary files are copied exactly to the hard drive, whereas text file’s line endings are reformatted properly for the resident operating system.

Calling Sequence

    As you have seen in this chapter, there is a variety of operations that can be performed on an HttpURLConnection object. You can set request headers, read response headers, POST data and read response data. Please note, however, that there is a very specific order that these operations must follow. For example, you can’t set a request header, after you are already reading the response. If you are reading the web server’s reply, the request was already sent. Therefore, all request information must be set before you begin working with the response. The general order that you should follow is shown here:

  • Step 1: Set any HTTP request headers.
  • Step 2: POST data, if this is a POST request.
  • Step 3: Read HTTP response headers.
  • Step 4: Read HTTP response data.

    If you ever face a bug where it seems the request headers are being ignored, check to see if you are not already calling a method related to the response before setting the header. All headers must be set before the request is sent.

Copyright 2005-2009 by Heaton Research, Inc.