When you call the getTag function of the HTML parse class, you are given an HTMLTag object. This object completely encapsulates the HTML tag that was just parsed. The HTMLTag class is shown in Listing 6.3.
The ParseHTML class does HTML parsing. This class is used by all of the recipes in this chapter. Additionally, many recipes through the remainder of the book will use the ParseHTML class. I will begin by showing you how to use the ParseHTML class. A later section will show you how the ParseHTML class was implemented.
Using ParseHTML
It is very easy to use the ParseHTML class. The following code fragment demonstrates how to make use of the ParseHTML class.
To properly parse any data, let alone HTML, it is very convenient to have a peekable stream. A peekable stream is a regular Java InputStream, except that you can peek several characters ahead, before actually reading these characters. First we will examine why it is so convenient to use PeekableInputStream.
Consider parsing the following the following line of HTML.
This chapter includes two recipes. These two recipes will demonstrate the following:
Determining if a URL uses HTTPS
Using HTTP authentication
The first recipe will introduce you to some of the things that can be done with the HttpsURLConnection class. The second recipe shows how to access a site that uses HTTP authentication.