org.encog.bot.spider
Class SpiderParseHTML
java.lang.Object
org.encog.bot.html.ParseHTML
org.encog.bot.spider.SpiderParseHTML
public class SpiderParseHTML
- extends ParseHTML
SpiderParseHTML: This class layers on top of the ParseHTML class and allows
the spider to extract what link information it needs. A SpiderParseHTML class
can be used just like the ParseHTML class, with the spider gaining its
information in the background.
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
SpiderParseHTML
public SpiderParseHTML(java.net.URL base,
SpiderInputStream is,
Spider spider)
- Construct a SpiderParseHTML object. This object allows you to parse HTML,
while the spider collects link information in the background.
- Parameters:
base - The URL that is being parsed, this is used for relative links.is - The InputStream being parsed.spider - The Spider that is parsing.
getStream
public SpiderInputStream getStream()
- Get the InputStream being parsed.
- Returns:
- The InputStream being parsed.
read
public int read()
throws java.io.IOException
- Read a single character. This function will process any tags that the
spider needs for navigation, then pass the character on to the caller.
This allows the spider to transparently gather its links.
- Overrides:
read in class ParseHTML
- Returns:
- The character read.
- Throws:
java.io.IOException - I/O error.
readAll
public void readAll()
throws java.io.IOException
- Read all characters on the page. This will discard these characters, but
allow the spider to examine the tags and find links.
- Throws:
java.io.IOException - I/O error.