Recipes
This chapter includes four recipes. These four recipes will demonstrate the following:
- Scanning a URL for headers
- Searching a range of IP addresses for web sites
- Downloading a binary or text file
- Monitoring a site to see that it stays up
These recipes will introduce you to some of the things that can be done with the HttpURLConnection class.
Recipe #4.1: Scan URL
Sometimes it is helpful to examine the headers for a particular URL. Recipe 4.1 shows how to use the HttpURLConnection class to access the headers for a particular URL. This program is shown in Listing 4.1.
Listing 4.1: Scan a URL for HTTP Response Headers (ScanURL.java)
package com.heatonresearch.httprecipes.ch4.recipe1; import java.io.IOException; import java.net.*; /** * Recipe #4.1: Scan a URL's Headers * Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com) * * HTTP Programming Recipes for Java Bots * ISBN: 0-9773206-6-9 * http://www.heatonresearch.com/articles/series/16/ * * This recipe displays the headers provided by a web server. * * This software is copyrighted. You may use it in programs * of your own, without restriction, but you may not * publish the source code without the author's permission. * For more information on distributing this code, please * visit: * http://www.heatonresearch.com/hr_legal.php * * @author Jeff Heaton * @version 1.1 */ public class ScanURL { /** * Scan the URL and display headers. * * @param u The URL to scan. * @throws IOException Error scanning URL. */ public void scan(String u) throws IOException { URL url = new URL(u); HttpURLConnection http = (HttpURLConnection) url.openConnection(); int count = 0; String key, value; do { key = http.getHeaderFieldKey(count); value = http.getHeaderField(count); count++; if (value != null) { if (key == null) System.out.println(value); else System.out.println(key + ": " + value); } } while (value != null); } /** * Typical Java main method, create an object, and then * start the object passing arguments. If insufficient * arguments are provided, then display startup * instructions. * * @param args Program arguments. */ public static void main(String args[]) { try { if (args.length != 1) { System.out.println("Usage: \njava ScanURL [URL to Scan]"); } else { ScanURL d = new ScanURL(); d.scan(args[0]); } } catch (Exception e) { e.printStackTrace(); } } }
This program is designed to accept one parameter, which is the URL that you would like to scan. For example, to scan the web site http://www.httprecipes.com/
you would use the following command.
ScanURL http://www.httprecipes.com/
Issuing the above command would cause the program to access the web site and then display all HTTP server headers that were returned. The above command simply shows the abstract format to call this recipe, with the appropriate parameters. For exact information on how to run this recipe refer to Appendix B, C, or D, depending on the operating system you are using.
All of the work performed by this program is done inside of the scan method. The first thing that the scan method does is to create a new URL object and then create an HttpURLConnection object from there. The following lines of code do this.
URL url = new URL(u); HttpURLConnection http = (HttpURLConnection) url.openConnection();
Once the connection has been established, a few local variables are created to keep track of the headers being displayed. The key variable will hold the name of each header found. The value variable will hold the value of that header. The count variable keeps a count of which header we are on.
int count = 0; String key, value;
Next, a do/while loop will be used to loop through each of the headers.
do
{
key = http.getHeaderFieldKey(count);
value = http.getHeaderField(count);We know that we have reached the end of the headers when we find a header that has a null value. However; a null key is acceptable, the first header always had a null key, because, as you recall from Chapter 1 and 2, the first header is always in a different format.
count++;
if (value != null)
{If there is no key value, then just display the value. If both are present, then display both. You will never have a key, but no value.
if (key == null) System.out.println(value); else System.out.println(key + ": " + value);
The loop continues until a value with the value of null is found.
} } while (value != null);
This process will continue until all headers have been displayed.
Recipe #4.2: Scan for Sites
You can also use the HttpURLConnection class to determine if there is an active web server at a specific URL. Recipe 4.2 shows how to loop through a series of IP addresses to find any web servers. To use this program, you must specify an IP address prefix. An example of this would be 192.168.1. Specifying this prefix would visit 256 IP addresses. It would visit from 192.168.1.0 to 192.168.1.255.
This recipe shows how to decrease the timeout for connection. Because almost all of the IP addresses will not have web servers, it takes a while for this example to run. This is because; by default Java will wait several minutes to connect to a web server.
Because of this the connection timeout is taken down to only a few seconds. For a more thorough scan the timeout can be increased. Listing 4.2 shows the site scanner:
Listing 4.2: Scan for Web Sites (ScanSites.java)
package com.heatonresearch.httprecipes.ch4.recipe2; import java.io.*; import java.net.*; import java.util.*; /** * Recipe #4.2: Scan IP's for Sites * Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com) * * HTTP Programming Recipes for Java Bots * ISBN: 0-9773206-6-9 * http://www.heatonresearch.com/articles/series/16/ * * This recipe shows how to scan a series of IP addresses * for web sites. This recipe is designed to take a IP * address, such as 192.168.1 and then cycle through all * 256 IP addresses by providing the fourth part of the * IP address. * * This software is copyrighted. You may use it in programs * of your own, without restriction, but you may not * publish the source code without the author's permission. * For more information on distributing this code, please * visit: * http://www.heatonresearch.com/hr_legal.php * * @author Jeff Heaton * @version 1.1 */ public class ScanSites { // the size of a buffer public static int BUFFER_SIZE = 8192; /** * This method downloads the specified URL into a Java * String. This is a very simple method, that you can * reused anytime you need to quickly grab all data from * a specific URL. * * @param url The URL to download. * @param timeout The number of milliseconds to wait for connection. * @return The contents of the URL that was downloaded. * @throws IOException Thrown if any sort of error occurs. */ public String downloadPage(URL url, int timeout) throws IOException { StringBuilder result = new StringBuilder(); byte buffer[] = new byte[BUFFER_SIZE]; URLConnection http = url.openConnection(); http.setConnectTimeout(100); InputStream s = http.getInputStream(); int size = 0; do { size = s.read(buffer); if (size != -1) result.append(new String(buffer, 0, size)); } while (size != -1); return result.toString(); } /** * Extract a string of text from between the two specified tokens. The * case of the two tokens need not match. * * @param url The URL to download. * @param token1 The text, or tag, that comes before the desired text. * @param token2 The text, or tag, that comes after the desired text. * @param count Which occurrence of token1 to use, 1 for the first. * @return The contents of the URL that was downloaded. */ public String extractNoCase(String str, String token1, String token2, int count) { int location1, location2; // convert everything to lower case String searchStr = str.toLowerCase(); token1 = token1.toLowerCase(); token2 = token2.toLowerCase(); // now search location1 = location2 = 0; do { location1 = searchStr.indexOf(token1, location1 + 1); if (location1 == -1) return null; count--; } while (count > 0); // return the result from the original string that has mixed // case location2 = str.indexOf(token2, location1 + 1); if (location2 == -1) return null; return str.substring(location1 + token1.length(), location2); } /** * Scan the specified IP address and return the title of * the web page found there, or null if no connection can * be made. * * @param ip The IP address to scan. * @return The title of the web page, or null if no website. */ private String scanIP(String ip) { try { System.out.println("Scanning: " + ip); String page = downloadPage(new URL("http://" + ip), 1000); String title = extractNoCase(page, "<title>", "</title>", 0); if (title == null) title = "[Untitled site]"; return title; } catch (IOException e) { return null; } } /** * Scan a range of 256 IP addressed. Provide the prefix * of the IP address, without the final fourth. For * example "192.168.1". * * @param ip The IP address prefix(i.e. 192.168.1) */ public void scan(String ip) { if (!ip.endsWith(".")) { ip += "."; } // create a list to hold sites found List<String> list = new ArrayList<String>(); // scan through IP addresses ending in 0 - 255 for (int i = 1; i < 255; i++) { String address = ip + i; String title = scanIP(address); if (title != null) list.add(address + ":" + title); } // now display the list of sites found System.out.println(); System.out.println("Sites found:"); if (list.size() > 0) { for (String site : list) { System.out.println(site); } } else { System.out.println("No sites found"); } } /** * Typical Java main method, create an object, and then * start the object passing arguments. If insufficient * arguments are provided, then display startup * instructions. * * @param args Program arguments. */ public static void main(String args[]) { try { if (args.length != 1) { System.out.println("Usage: ScanSites [IP prefix, i.e. 192.168.1]"); } else { ScanSites d = new ScanSites(); d.scan(args[0]); } } catch (Exception e) { e.printStackTrace(); } } }
To run this program, you must specify the IP prefix. For example, to scan the IP prefix 192.168.1 you would use the following command:
ScanSites http://www.httprecipes.com/
The above command simply shows the abstract format to call this recipe, with the appropriate parameters. For exact information on how to run this recipe refer to Appendix B, C, or D, depending on the operating system you are using. You may find more sites on your home network than you knew existed. For example, I found that my laser printer has a web site. Logging into my printer’s built in web site shows me how much toner is still available. You can see the results of my scan in Figure 4.1.
Figure 4.1: Scan for Sites

First the IP prefix is checked to see if it ends with a period “.”. If it does not end this way, then a period is appended. This is because we need the IP prefix in the form:
192.168.1.
not
192.168.1
We will be appending a count from 0 to 255 to the end, so the trailing period is completely necessary.
if (!ip.endsWith("."))
{
ip += ".";
}Next, an array is created to hold a list of the sites that are located. The sites located are not displayed until the end of the scan.
// Create a list to hold sites found. List<String> list = new ArrayList<String>();
Now we are ready to scan. A for loop is used to count from 0 to 255.
// Scan through IP addresses ending in 0 - 255.
for (int i = 0; i <= 255; i++)
{
String address = ip + i;
String title = scanIP(address);
if (title != null)
list.add(address + ":" + title);
}For each IP address, the scanIP function is called. If a valid site exists at that address, the title (from the HTML) is returned. If no valid site is found, then the value null is returned. The scanIP function is covered in detail later in this section.
Once the loop completes, we display what sites were found. The code to do this is shown below:
// Now display the list of sites found.
System.out.println();
System.out.println("Sites found:");
if (list.size() > 0)
{
for (String site : list)
{
System.out.println(site);
}The user is informed if there are no sites to display.
} else
{
System.out.println("No sites found");
}As you saw above, for each IP address, the scanIP method was called. I will now show you how the scanIP function is constructed.
The scanIP function begins by displaying the IP address that is currently being scanned. The downloadPage method is called to retrieve the HTML at the URL formed from the IP address. The following lines of code do this.
try
{
System.out.println("Scanning: " + ip);
String page = downloadPage(new URL("http://" + ip),1000);The downloadPage function is the one we created in Chapter 3; however, you will notice an additional parameter. The second parameter specifies a 1,000-millisecond timeout. If a connection is not made in one second, which is 1,000 milliseconds, the connection will abort and throw an exception.
Next the extractNoCase function is called to extract the text between the <title> and </title> tags. The extractNoCase is a special version of the extract function introduced in Chapter 3. The extractNoCase version of extract does not care about the case of the tags. For example, <title> and <Title> would be considered the same. If no title is found, then the site is listed as an “Untitled Site”.
String title = extractNoCase(page, "<title>", "</title>", 0); if (title == null) title = "[Untitled site]"; return title;
If an exception occurs, then a value of null is returned, indicating that a site could not be found.
} catch (IOException e)
{
return null;
}This recipe makes use of the extractNoCase and a new version of extract function. Both of these functions can be seen in Listing 4.2. They are both slight modifications of the functions introduced in Chapter 3. For more information on these functions, see Chapter 3.
Recipe #4.3: Download Binary or Text
Downloading a file from a URL is a common task for a bot. However, different procedures must be followed depending on the type of file being downloaded. If the file is binary, such as an image, then an exact copy of the file must be made on the local computer. If the file is text, then the line breaks must be properly formatted for the current operating system.
Chapter 3 introduced two recipes for downloading files from a URL. One version would download a text file; the other would download a binary file. As you saw earlier in this chapter, the content-type header tells what type of file will be downloaded. Recipe 4.3 contains a more sophisticated URL downloader, than that in Chapter 3. It first determines the type of file and then downloads it in the appropriate way. Listing 4.3 shows this new URL downloader.
Listing 4.3: Download Text or Binary (DownloadURL.java)
package com.heatonresearch.httprecipes.ch4.recipe3; import java.net.*; import java.io.*; /** * Recipe #4.3: Downloading a URL(text or binary) * Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com) * * HTTP Programming Recipes for Java Bots * ISBN: 0-9773206-6-9 * http://www.heatonresearch.com/articles/series/16/ * * This recipe shows how to download a text or binary file. * The recipe uses HTTP headers to determine if the file * is text or binary and automatically downloads it the * correct way. * * This software is copyrighted. You may use it in programs * of your own, without restriction, but you may not * publish the source code without the author's permission. * For more information on distributing this code, please * visit: * http://www.heatonresearch.com/hr_legal.php * * @author Jeff Heaton * @version 1.1 */ public class DownloadURL { public static int BUFFER_SIZE = 8192; /** * Download either a text or binary file from a URL. * The URL's headers will be scanned to determine the * type of tile. * * @param remoteURL The URL to download from. * @param localFile The local file to save to. * @throws IOException Exception while downloading. */ public void download(URL remoteURL, File localFile) throws IOException { HttpURLConnection http = (HttpURLConnection) remoteURL.openConnection(); HttpURLConnection.setFollowRedirects(true); InputStream is = http.getInputStream(); OutputStream os = new FileOutputStream(localFile); String type = http.getHeaderField("Content-Type").toLowerCase().trim(); if (type.startsWith("text")) downloadText(is, os); else downloadBinary(is, os); is.close(); os.close(); http.disconnect(); } /** * Overloaded version of download that accepts strings, * rather than URL objects. * * @param remoteURL The URL to download from. * @param localFile The local file to save to. * @throws IOException Exception while downloading. */ public void download(String remoteURL, String localFile) throws IOException { download(new URL(remoteURL), new File(localFile)); } /** * Download a text file. This is done by converting the line * ending characters to the correct type for the * operating system that is being used. * * @param is The input stream, which is the URL. * @param os The output stream, a local file. * @throws IOException Exception while downloading. */ private void downloadText(InputStream is, OutputStream os) throws IOException { byte lineSep[] = System.getProperty("line.separator").getBytes(); int ch = 0; boolean inLineBreak = false; boolean hadLF = false; boolean hadCR = false; do { ch = is.read(); if (ch != -1) { if ((ch == '\r') || (ch == '\n')) { inLineBreak = true; if (ch == '\r') { if (hadCR) os.write(lineSep); else hadCR = true; } else { if (hadLF) os.write(lineSep); else hadLF = true; } } else { if (inLineBreak) { os.write(lineSep); hadCR = hadLF = inLineBreak = false; } os.write(ch); } } } while (ch != -1); } /** * Download a binary file. This means make an exact * copy of the incoming stream. * * @param is The input stream, which is the URL. * @param os The output stream, a local file. * @throws IOException Exception while downloading. */ private void downloadBinary(InputStream is, OutputStream os) throws IOException { byte buffer[] = new byte[BUFFER_SIZE]; int size = 0; do { size = is.read(buffer); if (size != -1) os.write(buffer, 0, size); } while (size != -1); } /** * Typical Java main method, create an object, and then * start the object passing arguments. If insufficient * arguments are provided, then display startup * instructions. * * @param args Program arguments. */ public static void main(String args[]) { try { if (args.length != 2) { System.out .println("Usage: \njava DownloadURL [URL to Download] [Output File]"); } else { DownloadURL d = new DownloadURL(); d.download(args[0], args[1]); } } catch (Exception e) { e.printStackTrace(); } } }
To run this program you must specify the URL to download and the local file. For example, to download the contents of http://www.httprecipes.com
to the file local.html, you would use the following command:
DownloadURL http://www.httprecipes.com/ local.html
The above command simply shows the abstract format to call this recipe, with the appropriate parameters. For exact information on how to run this recipe refer to Appendix B, C, or D, depending on the operating system you are using. This program makes use of the following two methods that were first introduced in Chapter 3.
- downloadText
- downloadBinary
These two methods are exactly the same as the ones used in Chapter 3; therefore, they will not be discussed again here. If you would like more information about these two functions, refer to Chapter 3.
The example presented here connects to the specified URL and determines the type of that URL. Once the type is determined, the URL is downloaded by calling the appropriate download method, either downloadText or downloadBinary.
To begin, the downloadURL method creates an HttpURLConnection object to the specified URL. Then, an InputStream is created to receive the contents of the URL and an OutputStream is created to write the downloaded data to a file.
HttpURLConnection http = (HttpURLConnection) remoteURL.openConnection(); InputStream is = http.getInputStream(); OutputStream os = new FileOutputStream(localFile);
Next, the content-type header is checked to determine what type of file it is. If it starts with “text”, then the file is in the “text family”, and it will be downloaded as a text file. Otherwise, the file is downloaded as a binary file.
String type = http.getHeaderField("Content-Type").toLowerCase().trim();
if (type.startsWith("text"))
downloadText(is, os);
else
downloadBinary(is, os);Once the file has been downloaded, the objects are closed, and the HTTP connection disconnected.
is.close(); os.close(); http.disconnect();
This recipe can be used anywhere you need to download the contents of a URL. It frees the programmer of having to determine the type of file downloaded.
Recipe #4.4: Site Monitor
Bots are great at performing repetitive tasks. Probably one of the most repetitive tasks in the world, is checking to see if a web server is still up. If a person were to perform this task, they would sit at a computer with a stopwatch. Every minute, the user would click the refresh button on the browser, and make sure that the web site still loaded.
Recipe 4.4 will show how to accomplish this same task, using a bot. This program will attempt to connect to a web server every minute. As soon as the web server stops responding, the program displays a message alerting the user that the web server is down. This program is shown in Listing 4.4.
Listing 4.4: Monitor Site (MonitorSite.java)
package com.heatonresearch.httprecipes.ch4.recipe4; import java.util.*; import java.net.*; import java.io.*; /** * Recipe #4.4: Monitor a Site * Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com) * * HTTP Programming Recipes for Java Bots * ISBN: 0-9773206-6-9 * http://www.heatonresearch.com/articles/series/16/ * * This recipe will monitor the specified website to see * if the site is still "up". This recipe will scan the * site once a minute. * * This software is copyrighted. You may use it in programs * of your own, without restriction, but you may not * publish the source code without the author's permission. * For more information on distributing this code, please * visit: * http://www.heatonresearch.com/hr_legal.php * * @author Jeff Heaton * @version 1.1 */ public class MonitorSite { /** * Scan a URL every minute to make sure it is still up. * @param url The URL to monitor. */ public void monitor(URL url) { while (true) { System.out.println("Checking " + url + " at " + (new Date())); // try to connect try { URLConnection http = url.openConnection(); http.connect(); System.out.println("The site is up."); } catch (IOException e1) { System.out.println("The site is down!!!"); } // now wait for a minute before checking again try { Thread.sleep(60000); } catch (InterruptedException e) { } } } /** * Typical Java main method, create an object, and then * start the object passing arguments. If insufficient * arguments are provided, then display startup * instructions. * * @param args Program arguments. */ public static void main(String args[]) { try { if (args.length != 1) { System.out.println("Usage: MonitorSite [URL to Monitor]"); } else { MonitorSite d = new MonitorSite(); d.monitor(new URL(args[0])); } } catch (Exception e) { e.printStackTrace(); } } }
To run this program, you must specify the URL to monitor. For example, to monitor the web site at http://www.httprecipes.com
you would use the following command:
MonitorSite http://www.httprecipes.com/
The above command simply shows the abstract format to call this recipe, with the appropriate parameters. For exact information on how to run this recipe refer to Appendix B, C, or D, depending on the operating system you are using. The program begins by entering an endless loop. (Because of this, in order to exit this program, you must press ctrl-c or close its window.)
while (true)
{
System.out.println("Checking " + url + " at " + (new Date()));The program then attempts to connect to the web server by calling the openConnection method of the URL class.
// Try to connect.
try
{
URLConnection http = url.openConnection();
http.connect();If the site responds, then a message is displayed to indicate that the site is still up. If an exception is thrown, it is reported that the site is down.
System.out.println("The site is up.");
} catch (IOException e1)
{
System.out.println("The site is down!!!");
}The program will now wait for a minute before checking again. To do this, the sleep method of the Thread class is called.
// now wait for a minute before checking again
try
{
Thread.sleep(60000);
} catch (InterruptedException e)
{
}
}The InterruptedException is meaningless to our program because it is not multithreaded. As a result, the InterruptedException is ignored. However; we must include the catch block because InterruptedException is a checked exception, and Java requires that all checked exceptions be caught.
This program is a very simple web site monitoring utility. A more “industrial strength” version would perform some additional operations, such as:
- E-Mailing on failure
- Paging on failure
- Tracking multiple sites
As it is written, this recipe implements only the basic functionality.




