jeffheaton's picture

    This chapter includes four recipes. These four recipes will demonstrate the following:

  • Scanning a URL for headers
  • Searching a range of IP addresses for web sites
  • Downloading a binary or text file
  • Monitoring a site to see that it stays up

    These recipes will introduce you to some of the things that can be done with the HttpWebRequest and HttpWebResponse classes.

Recipe #4.1: Scan URL

    Sometimes it is helpful to examine the headers for a particular URL. Recipe 4.1 shows how to use the HttpWebResponse class to access the headers for a particular URL. This program is shown in Listing 4.1.

Listing 4.1: Scan a URL for HTTP Response Headers (ScanURL.java)

using System;
using System.IO;
using System.Net;

namespace Recipe4_1
{
	/// <summary>
	/// Recipe #4.1: Scan a URL's Headers
    /// Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com)
    ///
    /// HTTP Programming Recipes for C# Bots
    /// ISBN: 0-9773206-7-7
    /// http://www.heatonresearch.com/articles/series/20/
	///
	/// This recipe displays the headers provided by a web server.
	///
	/// This software is copyrighted. You may use it in programs
	/// of your own, without restriction, but you may not
	/// publish the source code without the author's permission.
	/// For more information on distributing this code, please
	/// visit:
	///    http://www.heatonresearch.com/hr_legal.php
	/// </summary>
	class ScanURL
	{
		/// <summary>
		/// Scan the URL and display headers.
		/// </summary>
		/// <param name="u">The URL to scan.</param>
		public void Scan(String u)
		{
			Uri url = new Uri(u);
			WebRequest http = HttpWebRequest.Create(url);
			WebResponse response = http.GetResponse();
			
			int count = 0;
			String key, value;

			for(count=0;count<response.Headers.Keys.Count;count++)
			{
				key = response.Headers.Keys[count];
				value = response.Headers[key];

				if (value != null)
				{
					if (key == null)
						Console.WriteLine(value);
					else
						Console.WriteLine(key + ": " + value);
				}
			}
		}


		/// <summary>
		/// The main entry point for the application.
		/// </summary>
		[STAThread]
		static void Main(string[] args)
		{
			if (args.Length != 1)
			{
                Console.WriteLine("Usage: Recipe4_1 [URL to Scan]");
			} 
			else
			{
				ScanURL d = new ScanURL();
				d.Scan(args[0]);
			}
		}


	}
}

    This program is designed to accept one parameter, which is the URL that you would like to scan. For example, to scan the web site http://www.httprecipes.com/
you would use the following command.

ScanURL http://www.httprecipes.com/

    Issuing the above command would cause the program to access the web site and then display all HTTP server headers that were returned.

    All of the work performed by this program is done inside the Scan method. The first thing that the scan method does is to create a new Uri object that is used to create an HttpWebRequest object from there. The following lines of code do this.

Uri url = new Uri(u);
WebRequest http = HttpWebRequest.Create(url);
WebResponse response = http.GetResponse();

    Once the connection has been established, a few local variables are created to keep track of the headers being displayed. The key variable will hold the name of each header found. The value variable will hold the value of that header. The count variable keeps a count of the header we are on.

int count = 0;
String key, value;

    Next, a for loop will be used to loop through each of the headers.

for(count=0;count<response.Headers.Keys.Count;count++)
{
key = response.Headers.Keys[count];
value = response.Headers[key];

    The headers are read in one by one and displayed.

if (value != null)
{
if (key == null)
Console.WriteLine(value);
else
Console.WriteLine(key + ": " + value);

    This process will continue until all headers have been displayed.

Recipe #4.2: Scan for Sites

    You can also use the C# HTTP classes to determine if there is an active web server at a specific URL. Recipe 4.2 shows how to loop through a series of IP addresses to find any web servers. To use this program, you must specify an IP address prefix. An example of this would be 192.168.1. Specifying this prefix would visit 256 IP addresses. It would visit from 192.168.1.0 to 192.168.1.255.

    This next recipe shows how to decrease the timeout for connection. Because almost all of the IP addresses do not have web servers, it takes a while for this example to run. This is because by default, Java will wait several minutes to connect to a web server.

    Because of this the connection timeout is taken down to only a few seconds. For a more thorough scan the timeout can be increased. Listing 4.2 shows the site scanner:

Listing 4.2: Scan for Web Sites (ScanSites.cs)

using System;
using System.Net;
using System.IO;
using System.Collections.Generic;

namespace Recipe4_2
{
	/// <summary>
    /// Recipe #4.2: Scan IP's for Sites
    /// Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com)
    ///
    /// HTTP Programming Recipes for C# Bots
    /// ISBN: 0-9773206-7-7
    /// http://www.heatonresearch.com/articles/series/20/
    ///
    /// This recipe shows how to scan a series of IP addresses
    /// for web sites.  This recipe is designed to take a IP
    /// address, such as 192.168.1 and then cycle through all
    /// 256 IP addresses by providing the fourth part of the
    /// IP address.
    ///
    /// This software is copyrighted. You may use it in programs
    /// of your own, without restriction, but you may not
    /// publish the source code without the author's permission.
    /// For more information on distributing this code, please
    /// visit:
    ///    http://www.heatonresearch.com/hr_legal.php
	/// </summary>
	class ScanSites
	{
		/// <summary>
		/// This method is very useful for grabbing information from a
		/// HTML page.  It extracts text from between two tokens, the
		/// tokens need not be case sensitive.
		/// </summary>
		/// <param name="str">The string to extract from.</param>
		/// <param name="token1">The text, or tag, that comes before the desired text</param>
		/// <param name="token2">The text, or tag, that comes after the desired text</param>
		/// <param name="count">Which occurrence of token1 to use, 1 for the first</param>
		/// <returns></returns>
		public String ExtractNoCase(String str, String token1, String token2,
			int count)
		{
			int location1, location2;

			// convert everything to lower case
			String searchStr = str.ToLower();
			token1 = token1.ToLower();
			token2 = token2.ToLower();

			// now search
			location1 = location2 = 0;
			do
			{
				location1 = searchStr.IndexOf(token1, location1 + 1);

				if (location1 == -1)
					return null;

				count--;
			} while (count > 0);

			// return the result from the original string that has mixed
			// case
            location1 += token1.Length;
			location2 = str.IndexOf(token2, location1 + 1);
			if (location2 == -1)
				return null;

			return str.Substring(location1, location2-location1 );
		}

		/// <summary>
		/// This method downloads the specified URL into a C#
		/// String. This is a very simple method, that you can
		/// reused anytime you need to quickly grab all data from
		/// a specific URL.
		/// </summary>
		/// <param name="url">The URL to download.</param>
		/// <param name="timeout">The amount of time to wait before aborting.</param>
		/// <returns>The contents of the URL that was downloaded.</returns>
		public String DownloadPage(Uri url,int timeout)
		{
            try
            {
                WebRequest http = HttpWebRequest.Create(url);
                http.Timeout = timeout;
                HttpWebResponse response = (HttpWebResponse)http.GetResponse();
                StreamReader stream = new StreamReader(response.GetResponseStream(), System.Text.Encoding.ASCII);

                String result = stream.ReadToEnd();

                response.Close();
                stream.Close();
                return result;
            }
            catch (Exception)
            {
                return null;
            }
		}

		/// <summary>
		/// Scan the specified IP address and return the title of
		/// the webpage found there, or null if no connection can
		/// be made.
		/// </summary>
		/// <param name="ip">The IP address to scan.</param>
		/// <returns>The title of the webpage, or null if no website.</returns>
		private String scanIP(String ip)
		{
            String title = null;

			Console.WriteLine("Scanning: " + ip);
			String page = DownloadPage(new Uri("http://" + ip), 1000);
            if (page != null)
            {
                title = ExtractNoCase(page, "<title>", "</title>", 0);
                if (title == null)
                    title = "[Untitled site]";
            }
			return title;
		}


		/// <summary>
		/// Scan a range of 256 IP addressed.  Provide the prefix
		/// of the IP address, without the final fourth.  For 
		/// example "192.168.1".
		/// </summary>
		/// <param name="ip">The IP address prefix(i.e. 192.168.1)</param>
        public void scan(String ip)
        {
            if (!ip.EndsWith("."))
            {
                ip += ".";
            }

            // Create a list to hold sites found.
            List<String> list = new List<String>();

            // Scan through IP addresses ending in 0 - 255.
            for (int i = 1; i < 255; i++)
            {
                String address = ip + i;
                String title = scanIP(address);
                if (title != null)
                    list.Add(address + ":" + title);
            }

            // Now display the list of sites found.
            Console.WriteLine();
            Console.WriteLine("Sites found:");
            if (list.Count > 0)
            {
                foreach (String site in list)
                {
                    Console.WriteLine(site);
                }
            }
            else
            {
                Console.WriteLine("No sites found");
            }
        }

		/// <summary>
		/// The main entry point for the application.
		/// </summary>
		[STAThread]
		static void Main(string[] args)
		{
            if (args.Length != 1)
            {
                Console.WriteLine("Usage: Recipe4_2 [IP prefix, i.e. 192.168.1]");
            }
            else
            {
                ScanSites d = new ScanSites();
                d.scan(args[0]);
            }
		}
	}
}

    To run this program, you must specify the IP prefix. For example, to scan the IP prefix 192.168.1, use the following command:

ScanSites http://www.httprecipes.com/

    You may find more sites on your home network than you knew existed. For example, I found that my laser printer has a web site. Logging into my printer’s built in web site shows me how much toner is still available. You can see the results of my scan in Figure 4.1.

Figure 4.1: Scan for Sites

Scan for Sites

    First the IP prefix is checked to see if it ends with a period “.”. If it does not end this way, then a period is appended. This is because we need the IP prefix in the form:

192.168.1.

    not

192.168.1 

    We will be appending a count from 0 to 255 to the end, so the trailing period is completely necessary.

if (!ip.EndsWith("."))
{
ip += ".";
}

    Next, an array is created to hold a list of the sites that are located. The sites located are not displayed until the end of the scan.

// Create a list to hold sites found.
List<String> list = new List<String>();

    Now we are ready to scan. A for loop is used to count from 0 to 255.

// Scan through IP addresses ending in 0 - 255.
for (int i = 1; i < 255; i++)
{
String address = ip + i;
String title = scanIP(address);
if (title != null)
list.Add(address + ":" + title);
}

    For each IP address, the ScanIP function is called. If a valid site exists at that address, the title (from the HTML) is returned. If no valid site is found, then the value null is returned. The ScanIP function is covered in detail later in this section.

    Once the loop completes, we display what sites were found. The code to do this is shown below:

// Now display the list of sites found.
Console.WriteLine();
Console.WriteLine("Sites found:");
if (list.Count > 0)
{
foreach (String site in list)
{
Console.WriteLine(site);
}
}

    The user is informed if there are no sites to display.

else
{
Console.WriteLine("No sites found");
}

    As you saw above, for each IP address, the ScanIP method was called. I will now show you how the ScanIP function is constructed.

    The ScanIP function begins by displaying the IP address that is currently being scanned. The DownloadPage method is called to retrieve the HTML at the URL formed from the IP address. The following lines of code do this.

String title = null;

Console.WriteLine("Scanning: " + ip);
String page = DownloadPage(new Uri("http://" + ip), 1000);

    If the DownloadPage function returns null, then the page could not be downloaded. If the page was downloaded, then look for the title and record it. If the page failed to download, there is nothing to record.

if (page != null)
{
title = ExtractNoCase(page, "<title>", "</title>", 0);
if (title == null)
title = "[Untitled site]";
}
return title;

    The DownloadPage function is the one we created in Chapter 3; however, you will notice an additional parameter. The second parameter specifies a 1,000-millisecond timeout. If a connection is not made in one second, (1,000 milliseconds), the connection will abort and throw an exception.

    The ExtractNoCase function is called to extract the text between the <title> and </title> tags. The ExtractNoCase is a special version of the extract function introduced in Chapter 3. The ExtractNoCase version of extract does not care about the case of the tags. For example, <title> and <Title> would be considered the same. If no title is found, then the site is listed as an “Untitled Site”.

    This recipe makes use of the ExtractNoCase and a new version of DownloadPage. Both of these functions can be seen in Listing 4.2. They are both slight modifications of the functions introduced in Chapter 3, “Simple HTTP Requests”. For more information on these functions, see Chapter 3.

Recipe #4.3: Download Binary or Text

    Downloading a file from a URL is a common task for a bot. However, different procedures must be followed depending on the type of file being downloaded. If the file is binary, such as an image, then an exact copy of the file must be made on the local computer. If the file is text, then the line breaks must be properly formatted for the current operating system.

    Chapter 3 introduced two recipes for downloading files from a URL. One version downloads a text file; the other downloads a binary file. As you saw earlier in this chapter, the content-type header tells what type of file will be downloaded. Recipe 4.3 contains a more sophisticated URL downloader, than that in Chapter 3. It first determines the type of file and then downloads it in the appropriate way. Listing 4.3 shows this new URL downloader.

Listing 4.3: Download Text or Binary (DownloadURL.cs)

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.Net;

namespace Recipe4_3
{
    /// <summary>
    /// Recipe #4.3: Downloading a URL(text or binary)
    /// Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com)
    ///
    /// HTTP Programming Recipes for C# Bots
    /// ISBN: 0-9773206-7-7
    /// http://www.heatonresearch.com/articles/series/20/
    ///
    /// This recipe shows how to download a text or binary file.
    /// The recipe uses HTTP headers to determine if the file
    /// is text or binary and automatically downloads it the
    /// correct way.
    ///
    /// This software is copyrighted. You may use it in programs
    /// of your own, without restriction, but you may not
    /// publish the source code without the author's permission.
    /// For more information on distributing this code, please
    /// visit:
    ///    http://www.heatonresearch.com/hr_legal.php/
    ///
    /// </summary>
    class DownloadURL
    {
        /// <summary>
        /// Download the specified text page.
        /// </summary>
        /// <param name="response">The HttpWebResponse to download from.</param>
        /// <param name="filename">The local file to save to.</param>
        public void DownloadBinaryFile(HttpWebResponse response, String filename)
        {
            byte[] buffer = new byte[4096];
            FileStream os = new FileStream(filename, FileMode.Create);
            Stream stream = response.GetResponseStream();

            int count = 0;
            do
            {
                count = stream.Read(buffer, 0, buffer.Length);
                if (count > 0)
                    os.Write(buffer, 0, count);
            } while (count > 0);

            response.Close();
            stream.Close();
            os.Close();
        }

        /// <summary>
        /// Download the specified text page.
        /// </summary>
        /// <param name="response">The HttpWebResponse to download from.</param>
        /// <param name="filename">The local file to save to.</param>
        public void DownloadTextFile(HttpWebResponse response, String filename)
        {
            byte[] buffer = new byte[4096];
            FileStream os = new FileStream(filename, FileMode.Create);
            StreamReader reader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.ASCII);
            StreamWriter writer = new StreamWriter(os, System.Text.Encoding.ASCII);

            String line;
            do
            {
                line = reader.ReadLine();
                if (line != null)
                    writer.WriteLine(line);

            } while (line != null);

            reader.Close();
            writer.Close();
            os.Close();
        }


        /// <summary>
        /// Download either a text or binary file from a URL.
        /// The URL's headers will be scanned to determine the
        /// type of tile.
        /// </summary>
        /// <param name="remoteURL">The URL to download from.</param>
        /// <param name="localFile">The local file to save to.</param>
        public void Download(Uri remoteURL, String localFile)
        {
            WebRequest http = HttpWebRequest.Create(remoteURL);
            HttpWebResponse response = (HttpWebResponse)http.GetResponse();

            String type = response.Headers["Content-Type"].ToLower().Trim();
            if (type.StartsWith("text"))
                DownloadTextFile(response, localFile);
            else
                DownloadBinaryFile(response, localFile);

        }


        /// <summary>
        /// The main entry point for the program.
        /// </summary>
        /// <param name="args">Program arguments.</param>
        static void Main(string[] args)
        {
            if (args.Length != 2)
            {
                Console.WriteLine("Usage: Recipe4_3 [URL to Download] [Output File]");
            }
            else
            {
                DownloadURL d = new DownloadURL();
                d.Download(new Uri(args[0]), args[1]);
            }
        }

    }
}

    To run this program you must specify the URL to download and the local file. For example, to download the contents of http://www.httprecipes.com
to the file local.html, use the following command:

DownloadURL http://www.httprecipes.com/ local.html

    This program makes use of the following two methods that were first introduced in Chapter 3.

  • DownloadText
  • DownloadBinary

    These two methods are exactly the same as the ones used in Chapter 3; therefore, they will not be discussed again here. If you would like more information about these two functions, refer to Chapter 3.

    The example presented here connects to the specified URL and determines the type of that URL. Once the type is determined, the URL is downloaded by calling the appropriate download method, either DownloadText or DownloadBinary.

WebRequest http = HttpWebRequest.Create(remoteURL);
HttpWebResponse response = (HttpWebResponse)http.GetResponse();

    Next, the content-type header is checked to determine what type of file it is. If it starts with “text”, then the file is in the “text family”, and it will be downloaded as a text file. Otherwise, the file is downloaded as a binary file.

String type = response.Headers["Content-Type"].ToLower().Trim();

if (type.StartsWith("text"))
DownloadTextFile(response, localFile);
else
DownloadBinaryFile(response, localFile);

    Next, the content-type header is checked to determine the type of file. If it starts with “text”, then the file is in the “text family” and it will be downloaded as a text file. Otherwise, the file is downloaded as a binary file.

    This recipe can be used anywhere you need to download the contents of a URL. It frees the programmer from having to determine the type of file downloaded.

Recipe #4.4: Site Monitor

    Bots are great at performing repetitive tasks. Probably one of the most repetitive tasks known, is checking to see if a web server is still up. If a person were to perform this task, they would sit at a computer with a stopwatch. Every minute, the user would click the refresh button on the browser, and make sure that the web site still loaded.

    Recipe 4.4 will show how to accomplish this same task, using a bot. This program will attempt to connect to a web server every minute. As soon as the web server stops responding, the program displays a message alerting the user that the web server is down. This program is shown in Listing 4.4.

Listing 4.4: Monitor Site (MonitorSite.cs)

using System;
using System.Collections.Generic;
using System.Text;
using System.Net;
using System.IO;
using System.Threading;

namespace Recipe4_4
{
    /// <summary>
    /// Recipe #4.4: Monitor a Site
    /// Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com)
    ///
    /// HTTP Programming Recipes for C# Bots
    /// ISBN: 0-9773206-7-7
    /// http://www.heatonresearch.com/articles/series/20/
    ///
    /// This recipe will monitor the specified website to see
    /// if the site is still "up".  This recipe will scan the
    /// site once a minute.
    ///
    /// This software is copyrighted. You may use it in programs
    /// of your own, without restriction, but you may not
    /// publish the source code without the author's permission.
    /// For more information on distributing this code, please
    /// visit:
    ///    http://www.heatonresearch.com/hr_legal.php
    /// </summary>
    class MonitorSite
    {
        /// <summary>
        /// Scan a URL every minute to make sure it is still up.
        /// </summary>
        /// <param name="url">The URL to monitor.</param>
        public void Monitor(Uri url)
        {
            while (true)
            {
                Console.WriteLine("Checking " + url + " at " + (new DateTime()));

                // Try to connect.
                try
                {
                    WebRequest http = HttpWebRequest.Create(url);
                    HttpWebResponse response = (HttpWebResponse)http.GetResponse();
                    Console.WriteLine("The site is up.");
                }
                catch (IOException)
                {
                    Console.WriteLine("The site is down!!!");
                }
                Thread.Sleep(60000);
            }
        }

        /// <summary>
        /// Download either a text or binary file from a URL.
        /// The URL's headers will be scanned to determine the
        /// type of tile.
        /// </summary>
        /// <param name="remoteURL">The URL to download from.</param>
        /// <param name="localFile">The local file to save to.</param>
        static void Main(string[] args)
        {
            if (args.Length != 1)
            {
                Console.WriteLine("Usage: Recipe4_4 [URL to Monitor]");
            }
            else
            {
                MonitorSite d = new MonitorSite();
                d.Monitor(new Uri(args[0]));
            }
        }

    }
}

    To run this program, you must specify which URL to monitor. For example, to monitor the web site at http://www.httprecipes.com,
use the following command:

MonitorSite http://www.httprecipes.com/

    The program begins by entering an endless loop. (Because of this, in order to exit this program, you must press ctrl-c or close its window.)

while (true)
{
Console.WriteLine("Checking " + url + " at " + (new DateTime()));

    The program then attempts to connect to the web server by creating a WebRequest object that corresponds to the specified Uri class.

try
{
WebRequest http = HttpWebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)http.GetResponse();

    If the site responds, then a message is displayed to indicate that the site is still up. If an exception is thrown, it is reported that the site is down.

Console.WriteLine("The site is up.");
}
catch (IOException)
{
Console.WriteLine("The site is down!!!");
}

System.out.println("The site is up.");
} catch (IOException e1)
{
System.out.println("The site is down!!!");
}

    The program will now wait for a minute before checking again. To do this, the Sleep method of the Thread class is called.

Thread.Sleep(60000);

    This program is a very simple web site monitoring utility. A more “industrial strength” version would perform some additional operations, such as:

  • E-Mailing on failure
  • Paging on failure
  • Tracking multiple sites

    As it is written, this recipe implements only the basic functionality.


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.