Summary
This chapter showed you how to extract data from HTML. Most of the data that a bot would like to access will be in HTML form. Previous chapters showed how to extract data from simple HTML constructs. This chapter expanded on that considerably.
The chapter began by showing you how to create an HTML parser. This HTML parser is fairly short in length, but it can handle any HTML file, even if not properly formatted. The HTML parser built into C# might face problems with improperly formatted HTML. Unfortunately, there is quite an amount of improperly formatted HTML on the web.
HTML pages come in a variety of formats. This chapter included seven recipes to show you how to extract data from many of these formats. You were shown how to extract hyperlinks, images, forms, and from multiple pages.
So far, the recipes in this book have mainly downloaded data from a web server. There has not been much interactivity with the web server. In the next chapter you will see how a bot can send form data to a web server. This allows the bot to interact with the web server just like a human using a form.




