The Encog Project

org.encog.bot.spider
Interface SpiderReportable

All Known Implementing Classes:
SimpleReport

public interface SpiderReportable

SpiderReportable: This interface defines a class that the spider can report its findings to.


Nested Class Summary
static class SpiderReportable.URLType
          The types of link that can be encountered.
 
Method Summary
 boolean beginHost(java.lang.String host)
          This function is called when the spider is ready to process a new host.
 void init(Spider spider)
          Called when the spider is starting up.
 boolean spiderFoundURL(java.net.URL url, java.net.URL source, SpiderReportable.URLType type)
          Called when the spider encounters a URL.
 void spiderProcessURL(java.net.URL url, java.io.InputStream stream)
          Called when the spider is about to process a NON-HTML URL.
 void spiderProcessURL(java.net.URL url, SpiderParseHTML parse)
          Called when the spider is ready to process an HTML URL.
 void spiderURLError(java.net.URL url)
          Called when the spider tries to process a URL but gets an error.
 

Method Detail

beginHost

boolean beginHost(java.lang.String host)
This function is called when the spider is ready to process a new host.

Parameters:
host - The new host that is about to be processed.
Returns:
True if this host should be processed, false otherwise.

init

void init(Spider spider)
Called when the spider is starting up. This method provides the SpiderReportable class with the spider object.

Parameters:
spider - The spider that will be working with this object.

spiderFoundURL

boolean spiderFoundURL(java.net.URL url,
                       java.net.URL source,
                       SpiderReportable.URLType type)
Called when the spider encounters a URL.

Parameters:
url - The URL that the spider found.
source - The page that the URL was found on.
type - The type of link this URL is.
Returns:
True if the spider should scan for links on this page.

spiderProcessURL

void spiderProcessURL(java.net.URL url,
                      java.io.InputStream stream)
                      throws java.io.IOException
Called when the spider is about to process a NON-HTML URL.

Parameters:
url - The URL that the spider found.
stream - An InputStream to read the page contents from.
Throws:
java.io.IOException - Thrown if an IO error occurs while processing the page.

spiderProcessURL

void spiderProcessURL(java.net.URL url,
                      SpiderParseHTML parse)
                      throws java.io.IOException
Called when the spider is ready to process an HTML URL.

Parameters:
url - The URL that the spider is about to process.
parse - An object that will allow you you to parse the HTML on this page.
Throws:
java.io.IOException - Thrown if an IO error occurs while processing the page.

spiderURLError

void spiderURLError(java.net.URL url)
Called when the spider tries to process a URL but gets an error.

Parameters:
url - The URL that generated an error.

The Encog Project