org.encog.bot.spider
Class SpiderParseHTML
java.lang.Object
org.encog.parse.tags.read.ReadTags
org.encog.parse.tags.read.ReadHTML
org.encog.bot.spider.SpiderParseHTML
public class SpiderParseHTML
- extends ReadHTML
This class layers on top of the ParseHTML class and allows the spider to
extract what link information it needs. A SpiderParseHTML class can be used
just like the ParseHTML class, with the spider gaining its information in the
background.
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
SpiderParseHTML
public SpiderParseHTML(WorkloadItem source,
SpiderInputStream is,
Spider spider)
- Construct a SpiderParseHTML object. This object allows you to parse HTML,
while the spider collects link information in the background.
- Parameters:
source - The URL that is being parsed, this is used for relative links.is - The InputStream being parsed.spider - The Spider that is parsing.
getStream
public SpiderInputStream getStream()
- Get the InputStream being parsed.
- Returns:
- The InputStream being parsed.
read
public int read()
- Read a single character. This function will process any tags that the
spider needs for navigation, then pass the character on to the caller.
This allows the spider to transparently gather its links.
- Overrides:
read in class ReadTags
- Returns:
- The character read.
readAll
public void readAll()
- Read all characters on the page. This will discard these characters, but
allow the spider to examine the tags and find links.