The Encog Project

org.encog.bot.spider
Interface SpiderReportable


public interface SpiderReportable

This interface defines a class that the spider can report its findings to.


Nested Class Summary
static class SpiderReportable.URLType
          The types of link that can be encountered.
 
Method Summary
 void init(Spider spider)
          Called when the spider is starting up.
 boolean spiderFoundURL(java.net.URL url, java.net.URL base, SpiderReportable.URLType type)
          Called when the spider encounters a URL.
 void spiderProcessURL(java.net.URL url, java.io.InputStream stream)
          Called when the spider is about to process a NON-HTML URL.
 void spiderProcessURL(java.net.URL url, SpiderParseHTML parse)
          Called when the spider is ready to process an HTML URL.
 void spiderURLError(java.net.URL url)
          Called when the spider tries to process a URL but gets an error.
 

Method Detail

init

void init(Spider spider)
Called when the spider is starting up. This method provides the SpiderReportable class with the spider object.

Parameters:
spider - The spider that will be working with this object.

spiderFoundURL

boolean spiderFoundURL(java.net.URL url,
                       java.net.URL base,
                       SpiderReportable.URLType type)
Called when the spider encounters a URL.

Parameters:
url - The URL that the spider found.
base - The base of the page that the URL was found on.
type - The type of link this URL is.
Returns:
True if the spider should scan for links on this page.

spiderProcessURL

void spiderProcessURL(java.net.URL url,
                      java.io.InputStream stream)
                      throws java.io.IOException
Called when the spider is about to process a NON-HTML URL.

Parameters:
url - The URL that the spider found.
stream - An InputStream to read the page contents from.
Throws:
java.io.IOException - Thrown if an IO error occurs while processing the page.

spiderProcessURL

void spiderProcessURL(java.net.URL url,
                      SpiderParseHTML parse)
                      throws java.io.IOException
Called when the spider is ready to process an HTML URL.

Parameters:
url - The URL that the spider is about to process.
parse - An object that will allow you you to parse the HTML on this page.
Throws:
java.io.IOException - Thrown if an IO error occurs while processing the page.

spiderURLError

void spiderURLError(java.net.URL url)
Called when the spider tries to process a URL but gets an error.

Parameters:
url - The URL that generated an error.

The Encog Project