This chapter illustrated how to use some of the basic HTTP functionality built into C#. You have seen how you can use the Uri class to open a stream to a web page. You also saw how to read the contents of the web page into a string. The recipes for this chapter will build on this.

    There are five recipes for this chapter. These recipes provide you with reusable code that demonstrates the basic HTTP programming learned in this chapter. These recipes demonstrate the following functionalities:

  • Download the contents of a web page
  • Extract data from a web page
  • Pass parameters to a web page
  • Parse time and date information

    We will begin with recipe 3.1, which demonstrates how to download the contents of a web page.

Recipe 3.1: Downloading the Contents of a Web Page

    This recipe is the culmination of the example code quoted up to this point, in this chapter. Recipe 3.1 accesses a URL and downloads the contents into string that is then displayed.

    This is shown in Listing 3.1.

Listing 3.1: Download a Web Page (GetPage.cs)

using System;
using System.Net;
using System.IO;

namespace Recipe3_1
{
	/// <summary>
    /// Recipe #3.1: Downloading the Contents of a Web Page
    /// Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com)
    ///
    /// HTTP Programming Recipes for C# Bots
    /// ISBN: 0-9773206-7-7
    /// http://www.heatonresearch.com/articles/series/20/
	///
	/// Simple class that demonstrates how to download a web
	/// page, and display it to the console.	
    /// 
	/// This software is copyrighted. You may use it in programs
	/// of your own, without restriction, but you may not
	/// publish the source code without the author's permission.
	/// For more information on distributing this code, please
	/// visit:
	///    http://www.heatonresearch.com/hr_legal.php
	///
	/// </summary>
	class GetPage
	{
		/// <summary>
		/// This method downloads the specified URL into a C#
		/// String. This is a very simple method, that you can
		/// reused anytime you need to quickly grab all data from
		/// a specific URL.
		/// </summary>
		/// <param name="url">The URL to download.</param>
		/// <returns>The contents of the URL that was downloaded.</returns>
		public String DownloadPage(Uri url)
		{
			WebRequest http = HttpWebRequest.Create(url);
			HttpWebResponse response = (HttpWebResponse)http.GetResponse();
			StreamReader stream = new StreamReader(response.GetResponseStream(),System.Text.Encoding.ASCII   );

			String result = stream.ReadToEnd();

			response.Close();
			stream.Close();
			return result;
		}

		/// <summary>
		/// Run the example.
		/// </summary>
		/// <param name="page">The page to download.</param>
		public void Go(String page)
		{
			Uri u = new Uri(page);
			String str = DownloadPage(u);
			Console.WriteLine(str);
		}

		/// <summary>
		/// The main entry point for the application.
		/// </summary>
		[STAThread]
		static void Main(string[] args)
		{
			GetPage module = new GetPage();
			String page;
			if (args.Length == 0)
				page = "http://www.httprecipes.com/1/3/time.php";
			else
				page = args[0];
			module.Go(page);
		}
	}
}

    The above example can be run in two ways. If you run the example without any parameters (by simply typing “GetSite”), it will download from the following URL, which is hardcoded in the recipe:

http://www.httprecipes.com/1/3/time.php

    If you run the program with arguments it will download the specified URL. For example, to download the contents of the homepage of the recipes site use the following command:

GetSite http://www.httprecipes.com

    The contents of http://www.httprecipes.com
will now be displayed to the console, instead of http://www.httprecipes.com/1/3/time.php
. For more information on how to execute the recipes in this book refer to Appendix B, “Compiling and Executing Examples.”

    This recipe provides one very useful function. The DownloadPage function, shown here:

public String DownloadPage(Uri url)

    This function accepts a Uri, and downloads the contents of that web site. The contents are returned as a string. The implementation of the DownloadPage function is simple, and follows the code already discussed in this chapter.

    This recipe can be applied to any real-world site that contains data on a single page for which you wish to download the HTML.

    Once you have the web page downloaded into a string, you may be wondering what you can do with the data. As you will see from the next recipe, you can extract information from that page.

Recipe 3.2: Extract Simple Information from a Web Page

    If you need to extract simple information from a web page, this recipe serves as a good foundation for more complex programs. This recipe downloads the contents of a web page and extracts a piece of information from that page. For many tasks, this recipe is all that is needed. This is particularly so, if you can get to the data directly from a URL and do not need to log in, or pass through any intermediary pages.

    This recipe will download the current time for the city of St. Louis, MO. To do this it will use the following URL:

http://www.httprecipes.com/1/3/time.php

    The above URL is one of the examples on the HTTP recipes web site. The contents of this page are shown in Listing 3.2. The piece of data that we would like to extract from Figure 3.2 is the current date and time. Figure 3.2 shows exactly what the web page looks like to a user.

Figure 3.2: The Current Time

The Current Time

    To know how to extract this date and time, we need to see what this page looks like to the computer. To do this, we must examine the HTML source. While viewing the above URL in a web browser, select "View Source". This shows Listing 3.2.

Listing 3.2: HTML Source for the Current Time

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<HTML>
<HEAD>
	<TITLE>HTTP Recipes</TITLE>
	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
	<meta http-equiv="Cache-Control" content="no-cache">
</HEAD>

<BODY>

<table border="0"><tr><td>
<a href="http://www.httprecipes.com/">
<img src="/images/logo.gif" alt="Heaton Research Logo" border="0"></a>
</td><td valign="top">Heaton Research, Inc.<br>
HTTP Recipes Test Site
</td></tr>
</table>
<hr><p><small>[<a href="/">Home</a>:<a href="/1/">First Edition</a>
:<a href="/1/3/">Chaper 3</a>]</small></p>


<h3>St. Louis, MO</h3>
The local time in St. Louis, MO is <b>Jun 27 2006 05:58:38 PM</b>.

<br><br><a href="cities.php">[Return to list of cities]</a><br>

<hr>
<p>Copyright 2006 by <a href="http://www.heatonresearch.com/">
Heaton Research, Inc.</a></p>
</BODY>
</HTML>

    Look at the above listing and see if you can find the time and date for St. Louis. Did you find it? It is the line about two-thirds of the way down that starts with the text “The local time in St. Louis, MO is”. To extract this data we need to look at the two HTML tags that enclose it. For this web page, the time and date are enclosed in the <b> and </b> tags.

    The following example, shown in Listing 3.3, will download this data, and extract the date and time information.

Listing 3.3: Get the Time in St. Louis (GetTime.cs)

using System;
using System.Net;
using System.IO;

namespace Recipe3_2
{
    /// <summary>
    /// Recipe #3.2: Extract Information from a Web Site
    /// Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com)
    ///
    /// HTTP Programming Recipes for C# Bots
    /// ISBN: 0-9773206-7-7
    /// http://www.heatonresearch.com/articles/series/20/
    ///
    /// Access the httprecipes.com site and get the time in 
    /// St. Louis, MO.  Shows how to parse data from HTML.
    /// 
    /// This software is copyrighted. You may use it in programs
    /// of your own, without restriction, but you may not
    /// publish the source code without the author's permission.
    /// For more information on distributing this code, please
    /// visit:
    ///    http://www.heatonresearch.com/hr_legal.php
    ///
    /// </summary>
	class GetTime
	{
		/// <summary>
		/// This method is very useful for grabbing information from a
		/// HTML page.  
		/// </summary>
		/// <param name="str">The string to parse.</param>
		/// <param name="token1">The text, or tag, that comes before the desired text</param>
		/// <param name="token2">The text, or tag, that comes after the desired text</param>
		/// <param name="count">Which occurrence of token1 to use, 1 for the first</param>
		/// <returns>The contents of the URL that was downloaded.</returns>
		public String Extract(String str, String token1, String token2, int count)
		{
			int location1, location2;

			location1 = location2 = 0;
			do
			{
				location1 = str.IndexOf(token1, location1+1);

				if (location1 == -1)
					return null;

				count--;
			} while (count > 0);

			location2 = str.IndexOf(token2, location1 + 1);
			if (location2 == -1)
				return null;

			location1+=token1.Length;
			return str.Substring(location1, location2-location1 );
		}

		/// <summary>
		/// This method downloads the specified URL into a C#
		/// String. This is a very simple method, that you can
		/// reused anytime you need to quickly grab all data from
		/// a specific URL.
		/// </summary>
		/// <param name="url">The URL to download.</param>
		/// <returns>The contents of the URL that was downloaded.</returns>
		public String DownloadPage(Uri url)
		{
			WebRequest http = HttpWebRequest.Create(url);
			HttpWebResponse response = (HttpWebResponse)http.GetResponse();
			StreamReader stream = new StreamReader(response.GetResponseStream(),System.Text.Encoding.ASCII   );

			String result = stream.ReadToEnd();
            
			response.Close();
			stream.Close();
			return result;
		}

		/// <summary>
		/// Run the example.
		/// </summary>
		public void Go()
		{
			Uri u = new Uri("http://www.httprecipes.com/1/3/time.php");
			String str = DownloadPage(u);

			Console.WriteLine(Extract(str, "<b>", "</b>", 1));
		}


		/// <summary>
		/// The main entry point for the application.
		/// </summary>
		[STAThread]
		static void Main(string[] args)
		{
			GetTime module = new GetTime();
			module.Go();
		}
	}
}

    The main portion of this program is contained in a method named Go. The following three lines do the main work performed by the Go method.

Uri u = new Uri("http://www.httprecipes.com/1/3/time.php");
String str = DownloadPage(u);

Console.WriteLine(Extract(str, "<b>", "</b>", 1));

    First, a Uri object is constructed with the URL that we are to download from. This Uri object is then passed to the DownloadPage function.

    Using the DownloadPage function from the last recipe, we can download the above HTML into a string. Now that the above data is in a string, you may ask - what is the easiest way to extract the date and time? Any C# string parsing method can do this. However, this recipe provides one very useful function, Extract to do this. The contents of the Extract function are shown here:

int location1, location2;

location1 = location2 = 0;
do
{
location1 = str.IndexOf(token1, location1+1);

if (location1 == -1)
return null;

count--;
} while (count > 0);

location2 = str.IndexOf(token2, location1 + 1);
if (location2 == -1)
return null;

location1+=token1.Length;
return str.Substring(location1, location2-location1 );

    As you can see from above, the Extract function is passed a string to parse, including the beginning and ending tags. The Extract function will then scan the specified string, and find the beginning tag. In this case, the beginning tag is <b>. Once the beginning tag is found, the Extract function will return all text found until the ending tag is found.

    It is important to note that the beginning and ending text need not be HTML tags. You can use any beginning and ending tag you wish with the Extract function.

    You might also notice that the Extract function accepts a number as its last parameter. In this case, the number passed was one. This number specifies which instance of the beginning text to locate. In this example there was only one <b> to find. What if there were several? Passing in a two for the last parameter would have located the text at the second instance of the <b> tag.

    The Extract function is not part of C#. It is a useful function that I developed to help with string parsing. The extract function returns some text that is bounded by two token strings. Now, let’s take a look at how it works.

    The Extract function begins by declaring two int variables. Additionally the parameters token1 and token2 are passed in. The parameter token1 holds the text, which is usually an HTML tag at the beginning of the desired text. The parameter token2 holds the text, which is usually an HTML tag at the end of the desired text.

int location1, location2;

location1 = location2 = 0;

    These two variables will hold the location of the beginning and ending text. To begin, set both to zero. Next, the function will begin looking for instances of token1. This is done with a do/while loop.

do
{
location1 = str.IndexOf(token1, location1+1);

if (location1 == -1)
return null;

    As you can see location1 is set to the location of token1. The search begins at location1. Since location1 begins with the value of zero, this search also begins at the beginning of the string. If no instance of location1 is found, the null is returned to let the caller know that the string could not be extracted.

    Each time an instance of token1 is found, the variable count is decreased by one. This is shown here:

count--;
} while (count > 0);

    Once the final instance of token1 has been found, it is time to locate the ending token. This is done with the following lines of code:

location2 = str.IndexOf(token2, location1 + 1);
if (location2 == -1)
return null;

location1+=token1.Length;
return str.Substring(location1, location2-location1 );

    The above code locates token2 using IndexOf. If the second token is not found, then null is returned to indicate an error. Otherwise Substring is called to return the text between the two tokens. It is important to remember to add the length of token1 to location1. If you do not add this to location1, you will extract token1 along with the desired text.

    This recipe can be applied to any real-world site that contains data on a single page that you wish to extract. Although this recipe extracted information from the web page, it did not do anything with it. The next recipe will process the downloaded data.

Recipe 3.3: Parsing Dates and Times

    This recipe shows how to extract data from several pages. It also shows how to parse date and time information. This recipe will download the date and time for several US cities. It will extract this data from the following URL:

http://www.httprecipes.com/1/3/cities.php

    Figure 3.3 shows this web page.

Figure 3.3: Cities for which to Display Time

Cities for which to Display Time

    As you can see from the above list, there are three USA cities for which you may choose to find the time. To find the time for each city, click on the link and view that city's page. This means a total of four pages to access - first the city list page, then a page for each of the three cities.

    The following recipe will access the city list page, obtain the URL for each city, and then obtain the time for that city. Now, let’s examine Listing 3.4 - the HTML that makes up the city list page.

Listing 3.4: The HTML for the Cities List

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<HTML>
<HEAD>
	<TITLE>HTTP Recipes</TITLE>
	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
	<meta http-equiv="Cache-Control" content="no-cache">
</HEAD>

<BODY>

<table border="0"><tr><td>
<a href="http://www.httprecipes.com/">
<img src="/images/logo.gif" alt="Heaton Research Logo" border="0"></a>
</td><td valign="top">Heaton Research, Inc.<br>
HTTP Recipes Test Site
</td></tr>
</table>
<hr><p><small>[<a href="/">Home</a>:<a href="/1/">First Edition</a>:
<a href="/1/3/">Chaper 3</a>]</small></p>

<p>Select a city from the list below, and you will be 
shown the local time for that city.<br>
<ul>
<li><a href="city.php?city=2">Baltimore, MD</a>
<li><a href="city.php?city=3">New York, NY</a>
<li><a href="city.php?city=1">St. Louis, MO</a></ul>

<hr>
<p>Copyright 2006 by <a href="http://www.heatonresearch.com/">
Heaton Research, Inc.</a></p>
</BODY>
</HTML>

    Can you find the cities in the above HTML? Find the <li> tags and you will find the cities. Each of these city lines link to the city.php page. For example, to display Baltimore's time, access the following URL:

http://www.httprecipes.com/1/3/city.php?city=2

    This recipe will access the city list page to obtain a list of cities. That list will then be used to build a second list that will contain the times for each of those cities. See Recipe 3.3 in Listing 3.5.

Listing 3.5: Get the Time for Select Cities (GetCityTime.cs)

using System;
using System.IO;
using System.Net;

namespace Recipe3_3
{
    /// <summary>
    /// Recipe #3.3: Parsing Dates and Times
    /// Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com)
    ///
    /// HTTP Programming Recipes for C# Bots
    /// ISBN: 0-9773206-7-7
    /// http://www.heatonresearch.com/articles/series/20/
    ///
    /// Access httprecipes.com and obtain the current time for
    /// several USA cities.
    /// 
    /// This software is copyrighted. You may use it in programs
    /// of your own, without restriction, but you may not
    /// publish the source code without the author's permission.
    /// For more information on distributing this code, please
    /// visit:
    ///    http://www.heatonresearch.com/hr_legal.php
    ///
    /// </summary>
	class GetCityTime
	{

		/// <summary>
		/// This method is very useful for grabbing information from a
		/// HTML page.  
		/// </summary>
		/// <param name="str">The string to parse.</param>
		/// <param name="token1">The text, or tag, that comes before the desired text</param>
		/// <param name="token2">The text, or tag, that comes after the desired text</param>
		/// <param name="count">Which occurrence of token1 to use, 1 for the first</param>
		/// <returns>The contents of the URL that was downloaded.</returns>
		public String Extract(String str, String token1, String token2, int count)
		{
			int location1, location2;

			location1 = location2 = 0;
			do
			{
				location1 = str.IndexOf(token1, location1+1);

				if (location1 == -1)
					return null;

				count--;
			} while (count > 0);

			location2 = str.IndexOf(token2, location1 + 1);
			if (location2 == -1)
				return null;

			location1+=token1.Length;
			return str.Substring(location1, location2-location1 );
		}

		/// <summary>
		/// This method downloads the specified URL into a C#
		/// String. This is a very simple method, that you can
		/// reused anytime you need to quickly grab all data from
		/// a specific URL.
		/// </summary>
		/// <param name="url">The URL to download.</param>
		/// <returns>The contents of the URL that was downloaded.</returns>
		public String DownloadPage(Uri url)
		{
			WebRequest http = HttpWebRequest.Create(url);
			HttpWebResponse response = (HttpWebResponse)http.GetResponse();
			StreamReader stream = new StreamReader(response.GetResponseStream(),System.Text.Encoding.ASCII   );

			String result = stream.ReadToEnd();

			response.Close();
			stream.Close();
			return result;
		}

		/**
		 * Run the example.
		 */
		public DateTime getCityTime(int city)
		{
			Uri u = new Uri("http://www.httprecipes.com/1/3/city.php?city=" + city);
			String str = DownloadPage(u);

			
			//SimpleDateFormat sdf = new SimpleDateFormat("MMM dd yyyy hh:mm:ss aa");
			DateTime date = DateTime.Parse(Extract(str, "<b>", "</b>", 1));
			return date;
		}


		/// <summary>
		/// Run the example.
		/// </summary>
		public void Go()
		{
			Uri u = new Uri("http://www.httprecipes.com/1/3/cities.php");
			String str = DownloadPage(u);
			int count = 1;
			bool done = false;

			while (!done)
			{
				String line = Extract(str, "<li>", "</a>", count);

				if (line != null)
				{
					String dl = Extract(line, "=", "\"", 2);
					int cityNum = int.Parse(dl);
					int i = line.IndexOf(">");
					String cityName = line.Substring(i + 1);
					DateTime cityTime = getCityTime(cityNum);
					String time = cityTime.ToShortTimeString();
					Console.WriteLine(count + " " + cityName + "\t" + time);
				} 
				else
					done = true;
				count++;
			}
		}




		/// <summary>
		/// The main entry point for the application.
		/// </summary>
		[STAThread]
		static void Main(string[] args)
		{
			GetCityTime module = new GetCityTime();
			module.Go();
		}
	}
}

    This recipe uses the same Extract and DownloadPage as the previous examples. However, the main Go method is different. We will begin by examining the Go method to see how the list of cities is downloaded.

    First, a Uri object is constructed for the city list URL, and the entire contents are downloaded.

Uri u = new Uri("http://www.httprecipes.com/1/3/cities.php");
String str = DownloadPage(u);

    After the entire contents of the city list page have been downloaded, we must parse through the HTML and find each of the cities. To begin, a count variable is created, which holds the current city number. Secondly, a done variable is created and initialized to false. This is demonstrated in the following lines of code:

int count = 1;
bool done = false;

while (!done)
{
String line = Extract(str, "<li>", "</a>", count);

    To extract each city, the beginning and ending tokens to search between must be identified. If you examine Listing 3.4, you will see that each city is on a line between the tokens <li> and </a>.

<li><a href="city.php?city=2">Baltimore, MD</a>

    Calling the Extract function with these two tokens will return Baltimore as follows:

<a href="city.php?city=2">Baltimore, MD

    The above value will be copied into the line variable that is then parsed.

if (line != null)
{
String dl = Extract(line, "=", "\"", 2);
int cityNum = int.Parse(dl);
int i = line.IndexOf(">");
String cityName = line.Substring(i + 1);
DateTime cityTime = getCityTime(cityNum);

    Next, we will parse out the city number by extracting what is between the = and the quote character. Given the line extracted (shown above), the extract function should return a “2” for Baltimore. Finally, we parse the city and state by searching for a > symbol. Extracting everything to the right of the > symbol will give us “Baltimore, MD.” We now have the city's number, as well as its name and state.

    We now can pass the city's number into the GetCityTime function. The GetCityTime function performs the same operation as the last recipe; that is, it will access the Uri for the city for which we are seeking the time. The time will be returned as a string. For more information about how the GetCityTime function works, review Recipe 3.2.

    Now that we have the city time, we will format and display it, as shown below:

String time = cityTime.ToShortTimeString();
Console.WriteLine(count + " " + cityName + "\t" + time);
} 
else
done = true;
count++;
}

    Notice in the above code, that in this program, the time is formatted to exclude the date. This allows us to display each of the cities, and the current time, without displaying the date.

    This recipe can be revised and applied to any real-world site containing a list that leads to multiple other pages from which you wish to extract data.

Recipe 3.4: Downloading a Binary File

    The last two recipes for this chapter demonstrate how to download data from a web site directly to a disk file. The first recipe will download to a binary file; the second will show how to download to a text file. A binary file download makes an exact copy of what was at the URL. The binary download is best used with a non-text resource, such as an image, sound or application file. Text files must be treated differently and will be discussed in detail in recipe 3.5.

    To demonstrate downloading to a binary file, this recipe will download an image from the HTTP recipes site. This image can be seen on the web page at the following URL:

http://www.httprecipes.com/1/3/sea.php

    The contents of this page are shown in Figure 3.4.

Figure 3.4: An Image to Download

An Image to Download

    If you examine the HTML source for this page you will find that the actual image is located at the following URL:

http://www.httprecipes.com/1/3/sea.jpg

    Now let’s examine how to download an image by downloading a binary file. The example recipe, Recipe 3.4, is shown below in Listing 3.6.

Listing 3.6: Download a Binary File (DownloadBinary.cs)

using System;
using System.IO;
using System.Net;

namespace Recipe3_4
{
	/// <summary>
	/// Recipe #3.4: Downloading a Binary File
    /// Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com)
    ///
    /// HTTP Programming Recipes for C# Bots
    /// ISBN: 0-9773206-7-7
    /// http://www.heatonresearch.com/articles/series/20/
	///
	/// Download a binary file, such as an image, from a URL.
	///
	/// This software is copyrighted. You may use it in programs
	/// of your own, without restriction, but you may not
	/// publish the source code without the author's permission.
	/// For more information on distributing this code, please
	/// visit:
	///    http://www.heatonresearch.com/hr_legal.php
	/// </summary>
	class DownloadBinary
	{
		/// <summary>
		/// Used to convert strings to byte arrays.
		/// </summary>
		private System.Text.UTF8Encoding  encoding=new System.Text.UTF8Encoding();

		/// <summary>
		/// This method downloads the specified URL into a C#
		/// String. This is a very simple method, that you can
		/// reused anytime you need to quickly grab all data from
		/// a specific URL.
		/// </summary>
		/// <param name="url">The URL to download.</param>
		/// <returns>The contents of the URL that was downloaded.</returns>
		public void DownloadBinaryFile(Uri url,String filename)
		{
			byte []buffer = new byte[4096];
			FileStream os = new FileStream(filename,FileMode.Create);
			WebRequest http = HttpWebRequest.Create(url);
			HttpWebResponse response = (HttpWebResponse)http.GetResponse();
			Stream stream = response.GetResponseStream();
			
			int count = 0;
			do
			{
				count = stream.Read(buffer,0,buffer.Length);
				if(count>0)
					os.Write(buffer,0,count);
			} while(count>0);
			
			response.Close();
			stream.Close();
			os.Close();
		}



        /// <summary>
        /// The main entry point for the program.
        /// </summary>
        /// <param name="args">Program arguments.</param>
		static void Main(string[] args)
		{
			if (args.Length != 2)
			{
				DownloadBinary d = new DownloadBinary();
				d.DownloadBinaryFile(new Uri("http://www.httprecipes.com/1/3/sea.jpg"), "./sea2.jpg");
			} 
			else
			{
				DownloadBinary d = new DownloadBinary();
				d.DownloadBinaryFile(new Uri(args[0]), args[1]);
			}
		}
	}
}

    This recipe is very similar to Recipe 3.1. However, in this recipe, you must specify a URL and a file to save that URL to. For example, to download the Heaton Research logo, use the following command:

DownloadBinary http://www.httprecipes.com/images/logo.gif ./logo.gif

    The above arguments would download the image shown above to a file named logo.jpg. For more information on how to execute the recipes in this book refer to Appendix B, “Compiling and Executing Examples.”

    As mentioned, this recipe is very similar to Recipe 3.1. It uses the same DownloadPage function as Recipe 3.1; however, an extra method is added named SaveBinaryPage. This method is shown here.

public void DownloadBinaryFile(Uri url,String filename)

    As you can see, this method accepts a filename and a page. The specified page content will be saved to the local file specified by filename. The variable, page, contains the contents of the page, as returned by the downloadPage function.

    To save a binary file, the data must be read in blocks, and then written to a file. First a buffer, size 4,096, is created to hold the blocks of data.

byte []buffer = new byte[4096];

    Next a file is opened to store the binary file to.

FileStream os = new FileStream(filename,FileMode.Create);

    A web connection is opened to download the URL from.

WebRequest http = HttpWebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)http.GetResponse();
Stream stream = response.GetResponseStream();

    A do/while loop is used to read each of the blocks and then write them to the file. Once a zero-sized block is read, there is no more data to download.

int count = 0;
do
{
count = stream.Read(buffer,0,buffer.Length);
if(count>0)
os.Write(buffer,0,count);
} while(count>0);
			
response.Close();
stream.Close();
os.Close();

    Finally, the stream and file are closed. This recipe could be applied to any real-world site where you need to download images, or other binary files to disk.

    In this recipe, you learned how to download a binary file. Binary files are exact copies of what is downloaded from the URL. In the next recipe you will see how to download a text file.

Recipe 3.5: Downloading a Text File

    This recipe will download a web page to a text file. But why is a text file treated differently to a binary file? They are treated differently because different operating systems end lines differently. Table 3.2 summarizes how the different operating systems store text file line breaks.

Table 3.2: How Operating Systems End Lines

Operating System ASCII Codes C#
UNIX #10 "\n"
Windows #13 #10 "\r\n"
Mac OSX #10 "\n"
Mac Classic #13 "\r"

    To download a text file properly, the program must make sure that the line breaks are compatible with the operating system. C# currently runs primarily on the Windows platform. However, projects such as Mono will continue to expand C#’s reach. Because of this, it is important to handle text files properly for the operating system your bot is running on. Listing 3.7 shows how this is done.

Listing 3.7: Download a Text File (DownloadText.cs)

using System;
using System.IO;
using System.Net;

namespace Recipe3_5
{
	/// <summary>
	/// Recipe #3.5: Downloading a Text File
    /// Copyright 2007 by Jeff Heaton(jeff@jeffheaton.com)
    ///
    /// HTTP Programming Recipes for C# Bots
    /// ISBN: 0-9773206-7-7
    /// http://www.heatonresearch.com/articles/series/20/
	///
	/// Download a text file, such as a HTML page, from a URL.
	///
	/// This software is copyrighted. You may use it in programs
	/// of your own, without restriction, but you may not
	/// publish the source code without the author's permission.
	/// For more information on distributing this code, please
	/// visit:
	///    http://www.heatonresearch.com/hr_legal.php
	/// </summary>
	class DownloadText
	{
		/// <summary>
		/// Download the specified text page.
		/// </summary>
		/// <param name="page">The URL to download from.</param>
		/// <param name="filename">The local file to save to.</param>
		public void DownloadTextFile(String page, String filename)
		{
			Uri u = new Uri(page);
			FileStream os = new FileStream(filename,FileMode.Create);
            HttpWebRequest http = (HttpWebRequest)HttpWebRequest.Create(u);
			HttpWebResponse response = (HttpWebResponse)http.GetResponse();
			StreamReader reader = new StreamReader(response.GetResponseStream(),System.Text.Encoding.ASCII   );
			StreamWriter writer = new StreamWriter(os,System.Text.Encoding.ASCII   );
            http.AllowAutoRedirect = false;
			String line;
			do
			{
				line = reader.ReadLine();
				if( line!=null )
					writer.WriteLine(line);

			} while(line!=null);

			reader.Close();
			writer.Close();
			os.Close();
		}


		/// <summary>
		/// The main entry point for the application.
		/// </summary>
		[STAThread]
		static void Main(string[] args)
		{
			if (args.Length != 2)
			{
				DownloadText d = new DownloadText();
				d.DownloadTextFile("http://www.httprecipes.com/1/3/text.php", "./text.html");
			} 
			else
			{
				DownloadText d = new DownloadText();
				d.DownloadTextFile(args[0], args[1]);
			}
		}
	}
}

    It is easy to use this recipe. For example, to download the main page of the HTTP recipes site, use the following command:

DownloadText http://www.httprecipes.com/ ./contents.txt

    The above arguments would download the contents of the main page of the HTTP recipes site to the file named contents.txt. For more information on how to execute the recipes in this book refer to Appendix B, “Compiling and Executing Examples.”

    This recipe works differently to Recipe 3.4, in that the text file is not first loaded to a string. Rather, the text file is read from the input stream as it is written to the output stream. One method, DownloadText, accepts an input stream and an output stream. The input stream should be from the URL, and the output stream should be to a disk file. This method is shown here:

private void downloadText(InputStream is, OutputStream os) throws IOException
{

    The first thing that the DownloadText method must do is create a Uri object to hold the page that is to be downloaded.

Uri u = new Uri(page);

    Next a file is opened to hold the text file.

FileStream os = new FileStream(filename,FileMode.Create);

    A stream is acquired for the desired page.

WebRequest http = HttpWebRequest.Create(u);
HttpWebResponse response = (HttpWebResponse)http.GetResponse();

    The StreamReader class allows a stream to be read as text. When the StreamReader class is used C# will handle using the correct end of line characters.

StreamReader reader = new StreamReader(response.GetResponseStream(),System.Text.Encoding.ASCII );

    The StreamWriter class allows a stream to be written as text. When the StreamWriter class is used, C# will write the correct end of line characters.

StreamWriter writer = new StreamWriter(os,System.Text.Encoding.ASCII );

    To download the file, the URL is read line-by-line. As each line is read, it is written to the StreamWriter.

String line;
do
{
line = reader.ReadLine();
if( line!=null )
writer.WriteLine(line);
} while(line!=null);

    Finally all of the streams are closed.

reader.Close();
writer.Close();
os.Close();

    The algorithm is useful because it allows you to convert incoming text to exactly how the operating system would like it to be represented.


Copyright 2005 - 2010 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright and trademark information.