|
The Encog Project | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.encog.bot.spider.Spider
public class Spider
A spider is a special sort of bot that crawls the pages on a web site. It begins with one entry web page and then finds all of the links visiting those pages as well. All data found is reported to the SpiderReportable interface. The queue of pages to access must be stored in a database. This database is accessed using the Hibernate ORM. For shorter spidering tasks an in-memory database can be used such as HSQL in Java. Spiders must typically wait for the pages that they are accessing to load. Because if this it is very advantageous to use a spider in a multithreaded way. To do this the spider uses the Encog threading framework, which in turn makes use of whatever underlying thread pool is provided by either Java or C#. For more information about multithreading, refer to the EncogConcurrency class.
| Field Summary | |
|---|---|
static int |
DEFAULT_TIMEOUT
The default timeout. |
| Constructor Summary | |
|---|---|
Spider(SessionManager manager,
SpiderReportable report)
Construct a new spider. |
|
| Method Summary | |
|---|---|
void |
addURL(java.net.URL url,
WorkloadItem source)
Add a URL to the spider for processing. |
java.net.URL |
convertURL(java.lang.String aurl)
Convert the specified String to a URL. |
SpiderReportable |
getReport()
|
SessionManager |
getSessionManager()
|
int |
getTimeout()
The current HTTP timeout. |
java.lang.String |
getUserAgent()
|
void |
process(java.net.URL start)
Process the specified URL. |
void |
setTimeout(int timeout)
St the HTTP timeout. |
void |
setUserAgent(java.lang.String userAgent)
Set the user agent. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final int DEFAULT_TIMEOUT
| Constructor Detail |
|---|
public Spider(SessionManager manager,
SpiderReportable report)
manager - The ORM manger to use.report - The object to report progress to.| Method Detail |
|---|
public void addURL(java.net.URL url,
WorkloadItem source)
url - The URL to add.source - The source the URL came from.public java.net.URL convertURL(java.lang.String aurl)
aurl - A String to convert into a URL.
public SpiderReportable getReport()
public SessionManager getSessionManager()
public int getTimeout()
public java.lang.String getUserAgent()
public void process(java.net.URL start)
start - The starting URL.public void setTimeout(int timeout)
timeout - The timeout.public void setUserAgent(java.lang.String userAgent)
userAgent - The user agent.
|
The Encog Project | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||