jeffheaton's picture

    Java allows you to read data from URLs. This forms the basis of HTTP programming in Java. In this chapter you will see how to construct simple requests from web sites. What is meant by a simple request? A simple request is a request where you only request data from a URL. It can get much more complex than that. As you progress through the book you will learn more complex HTTP programming topics such as:

  • HTTPS
  • Posting Data
  • Cookies
  • Authentication
  • Content Types

    For now, we will focus on simply getting data from a URL, and leave the more complex operations for later. There are three basic steps that you will need to carry out to read data from a URL. These steps are summarized as follows:

  • Create a URL Object
  • Open a Stream
  • Read Data from the Stream

    We will begin by examining the Java URL class.

The URL Class

    Java provides a class to hold URLs. Even though a URL is actually a String, it is convenient to have a class to handle the URL. The URL class offers several advantages over storing URLs as strings. There are methods to:

  • Determine if URL is valid
  • Extract information, such as host or schema
  • Open a connection to the URL

    To create a URL object, simply pass the URL string to the constructor of the URL class, as follows:

URL url = new URL("http://www.httprecipes.com");

    This will create a new URL object that is ready to be used. However, it can throw a checked exception, named MalformedURLException. As mentioned previously, to properly handle the exception you should use the following code.

try
{
URL url = new URL("http://www.httprecipes.com");
}
catch(MalformedURLException e)
{
System.out.println("This URL is not valid.");
}

    The MalformedURLException will be thrown if the provided URL is invalid. For example, a URL such as the following would throw the exception:

http:////www.httprecipes.com/

    The above URL would throw the exception, because the URL has four slashes (////), which is not valid. It is important to remember that the URL class only checks to see if the URL is valid. It does NOT check to see if the URL actually exists on the Internet. Existence of the URL will not be verified until a connection is made. The next section discusses how to open a connection.

Opening the Stream

    Java uses streams to access files and perform other I/O operations. When you access the URL, you will be given an InputStream. This InputStream is used to download the contents of the URL. The URL class makes it easy to open a stream for the URL. To open a stream, you simply use the openStream function, provided by the URL class. The following code shows how this is done.

try
{
URL url = new URL("http://www.httprecipes.com");
InputStream is = url.openStream();
}
catch(MalformedURLException e)
{
System.out.println("Invalid URL");
}
catch(IOException e)
{
System.out.println("Could not connect to URL");
}

    As you can see, the above code is very similar to the code from the last section. However, an additional line follows the URL declaration. This line calls the openStream function, and receives an InputStream object. You will see what to do with this object in the next section.

    The above code also has to deal with an additional exception. The IOException can be thrown by the openStream function, so it is necessary to catch the exception. Remember from the last section that the constructor of the URL class does not check to see if a URL actually exists? This is where that is checked. If the URL does not exist, or if there is any trouble connecting to the web server that holds that URL, then an IOException will be thrown.

    Now that you have constructed the URL object and opened a connection, you are ready to download the data from that URL.

Downloading the Contents

    Downloading the contents of a web page uses the same procedure you would use to read data from any input stream. Just as if you were reading from a disk file, you will use the read function of the stream. The following block of code reads the contents of the URL and stores it into a StringBuilder.

try
{
URL u = new URL("http://www.httprecipes.com");
InputStream is = u.openInputStream();

StringBuilder result = new StringBuilder();
byte buffer[] = new byte[BUFFER_SIZE];

InputStream s = u.openStream();
int size = 0;

do
{
size = s.read(buffer);
if (size != -1)
result.append(new String(buffer, 0, size));
} while (size != -1);

System.out.println( result.toString() );
}
catch(MalformedURLException e)
{
System.out.println("Invalid URL");
}
catch(IOException e)
{
System.out.println("Could not connect to URL");
}

    As you can see, the above code is a continuation of what we have already seen. Just as in the previous code segments, the URL object is first created. Next a stream is opened to the URL. Finally, a while loop is used to loop through and read from the stream, until there is nothing more to read.

    When reading from a stream, it is best to read the data in blocks, rather than one byte at a time. To accomplish this, a buffer, of size 8,192, is used. The choice for 8,192 is purely arbitrary. It is 8 kilobytes. Many web pages are under 8k, so they will be read in one block. Once the read function returns -1 we know there is no data left to read.


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.