jeffheaton's picture
in

    Oracle is a commercial database product. Many companies use oracle. For more information about Oracle you should visit the following URL:

http://www.oracle.com

    Oracle also makes a free version of their database, named Oracle Express. This free version allows developers to try the Oracle database, without having to purchase an expensive license. For more information about Oracle Express, visit the following URL:

http://www.oracle.com/technology/products/database/xe/index.html

    The DDL script to create the tables on Oracle is shown in Listing F.5.

Listing F.5: Oracle DDL Script

-- Create SPIDER_WORKLOAD

CREATE TABLE SPIDER_WORKLOAD
(
WORKLOAD_ID INTEGER NOT NULL,
HOST INTEGER NOT NULL,
URL VARCHAR2(2083 BYTE) NOT NULL,
STATUS VARCHAR2(1 BYTE) NOT NULL,
DEPTH INTEGER NOT NULL,
URL_HASH INTEGER NOT NULL,
SOURCE_ID INTEGER NOT NULL
)
LOGGING 
NOCOMPRESS 
NOCACHE
NOPARALLEL
MONITORING;


CREATE INDEX IDX_STATUS ON SPIDER_WORKLOAD
(STATUS)
LOGGING
NOPARALLEL;


CREATE INDEX IDX_URL_HASH ON SPIDER_WORKLOAD
(URL_HASH)
LOGGING
NOPARALLEL;


CREATE UNIQUE INDEX PK_WORKLOAD_ID ON SPIDER_WORKLOAD
(WORKLOAD_ID)
LOGGING
NOPARALLEL;


ALTER TABLE SPIDER_WORKLOAD ADD (
CONSTRAINT PK_WORKLOAD_ID
PRIMARY KEY
(WORKLOAD_ID));

-- Create SPIDER_HOST

CREATE TABLE SPIDER_HOST
(
HOST_ID INTEGER NOT NULL,
HOST VARCHAR2(255) NOT NULL,
STATUS VARCHAR2(1 BYTE) NOT NULL,
URLS_DONE INTEGER NOT NULL,
URLS_ERROR INTEGER NOT NULL
)
LOGGING 
NOCOMPRESS 
NOCACHE
NOPARALLEL
MONITORING;


CREATE UNIQUE INDEX PK_HOST_ID ON SPIDER_HOST
(HOST_ID)
LOGGING
NOPARALLEL;


ALTER TABLE SPIDER_HOST ADD (
CONSTRAINT PK_HOST_ID
PRIMARY KEY
(HOST_ID));

-- Create Sequences

CREATE SEQUENCE spider_workload_seq;
CREATE SEQUENCE spider_host_seq;

    To make use of Oracle, you will need the Oracle JDBC driver. There are several different drivers available for Oracle. Explaining all drivers is beyond the scope of this book. However, samples will be provided for the “thin” driver. This driver, along with other Oracle drivers, is contained in a Jar file that you will obtain when you set up the Oracle client. The name of this Jar file is as follows:

ojdbc14.jar

    The name of the Jar file may change as new versions are released. However, you will need to add this file to the classpath when you run a spider session that requires Oracle. The driver that you must specify in your configuration file is as follows:

oracle.jdbc.driver.OracleDriver

    Additionally, you will have to specify the database connection parameters in the database URL. This URL will look similar to the following:

jdbc:oracle:thin:@127.0.0.1:1532:database_name

    A sample configuration file is shown in Listing F.6.

Listing F.6: Sample Spider Configuration for Oracle

timeout:		60000
maxDepth:		-1
userAgent:
corePoolSize:	100 
maximumPoolSize:100
keepAliveTime:	60
dbURL: jdbc:oracle:thin:@127.0.0.1:1532:database_name
dbClass: oracle.jdbc.driver.OracleDriver
dbUID:username
dbPWD:password
workloadManager:com.heatonresearch.httprecipes.spider.workload.sql.oracle.OracleWorkloadManager
startup:		clear
filter:			com.heatonresearch.httprecipes.spider.filter.RobotsFilter

    You will also notice that Oracle uses a specialized workload manager. This workload manager is named:

com.heatonresearch.httprecipes.spider.workload.sql.oracle.OracleWorkloadManager

    This workload manager provides a few specialized SQL commands that Oracle requires.


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.