The Encog Project

org.encog.bot.html
Class ParseHTML

java.lang.Object
  extended by org.encog.bot.html.ParseHTML
Direct Known Subclasses:
SpiderParseHTML

public class ParseHTML
extends java.lang.Object

ParseHTML: This is the class that actually parses the HTML and outputs HTMLTag objects and raw text.


Field Summary
static char BULL
          Special char.
static char CR
          Carriage return.
static char LF
          Linefeed.
static int MAX_NAME_LENGTH
          The maximum name that will be parsed.
static char TRADE
          Special char.
 
Constructor Summary
ParseHTML(java.io.InputStream is)
          The constructor should be passed an InputStream that we will parse from.
 
Method Summary
protected  void eatWhitespace()
          Remove any whitespace characters that are next in the InputStream.
 HTMLTag getTag()
          Return the last tag found, this is normally called just after the read function returns a zero.
protected  java.lang.String parseAttributeName()
          Parse an attribute name, if one is present.
protected  java.lang.String parseString()
          Called to parse a double or single quote string.
protected  void parseTag()
          Called when a tag is detected.
 int read()
          Read a single character from the HTML source, if this function returns zero(0) then you should call getTag to see what tag was found.
 java.lang.String toString()
          Convert the HTML document back to a string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

BULL

public static final char BULL
Special char.

See Also:
Constant Field Values

TRADE

public static final char TRADE
Special char.

See Also:
Constant Field Values

CR

public static final char CR
Carriage return.

See Also:
Constant Field Values

LF

public static final char LF
Linefeed.

See Also:
Constant Field Values

MAX_NAME_LENGTH

public static final int MAX_NAME_LENGTH
The maximum name that will be parsed.

See Also:
Constant Field Values
Constructor Detail

ParseHTML

public ParseHTML(java.io.InputStream is)
The constructor should be passed an InputStream that we will parse from.

Parameters:
is - An InputStream to parse from.
Method Detail

eatWhitespace

protected void eatWhitespace()
                      throws java.io.IOException
Remove any whitespace characters that are next in the InputStream.

Throws:
java.io.IOException - If an I/O exception occurs.

getTag

public HTMLTag getTag()
Return the last tag found, this is normally called just after the read function returns a zero.

Returns:
The last HTML tag found.

parseAttributeName

protected java.lang.String parseAttributeName()
                                       throws java.io.IOException
Parse an attribute name, if one is present.

Returns:
The attribute name.
Throws:
java.io.IOException - If an I/O exception occurs.

parseString

protected java.lang.String parseString()
                                throws java.io.IOException
Called to parse a double or single quote string.

Returns:
The string parsed.
Throws:
java.io.IOException - If an I/O exception occurs.

parseTag

protected void parseTag()
                 throws java.io.IOException
Called when a tag is detected. This method will parse the tag.

Throws:
java.io.IOException - If an I/O exception occurs.

read

public int read()
         throws java.io.IOException
Read a single character from the HTML source, if this function returns zero(0) then you should call getTag to see what tag was found. Otherwise the value returned is simply the next character found.

Returns:
The character read, or zero if there is an HTML tag. If zero is returned, then call getTag to get the next tag.
Throws:
java.io.IOException - If an error occurs while reading.

toString

public java.lang.String toString()
Convert the HTML document back to a string.

Overrides:
toString in class java.lang.Object
Returns:
The string form of the object.

The Encog Project