Setting Up Oracle
Oracle is a commercial database product. Many companies use oracle. For more information about Oracle you should visit the following URL:
Oracle also makes a free version of their database, named Oracle Express. This free version allows developers to try the Oracle database, without having to purchase an expensive license. For more information about Oracle Express, visit the following URL:
http://www.oracle.com/technology/products/database/xe/index.html
The DDL script to create the tables on Oracle is shown in Listing F.5.
Listing F.5: Oracle DDL Script
-- Create SPIDER_WORKLOAD CREATE TABLE SPIDER_WORKLOAD ( WORKLOAD_ID INTEGER NOT NULL, HOST INTEGER NOT NULL, URL VARCHAR2(2083 BYTE) NOT NULL, STATUS VARCHAR2(1 BYTE) NOT NULL, DEPTH INTEGER NOT NULL, URL_HASH INTEGER NOT NULL, SOURCE_ID INTEGER NOT NULL ) LOGGING NOCOMPRESS NOCACHE NOPARALLEL MONITORING; CREATE INDEX IDX_STATUS ON SPIDER_WORKLOAD (STATUS) LOGGING NOPARALLEL; CREATE INDEX IDX_URL_HASH ON SPIDER_WORKLOAD (URL_HASH) LOGGING NOPARALLEL; CREATE UNIQUE INDEX PK_WORKLOAD_ID ON SPIDER_WORKLOAD (WORKLOAD_ID) LOGGING NOPARALLEL; ALTER TABLE SPIDER_WORKLOAD ADD ( CONSTRAINT PK_WORKLOAD_ID PRIMARY KEY (WORKLOAD_ID)); -- Create SPIDER_HOST CREATE TABLE SPIDER_HOST ( HOST_ID INTEGER NOT NULL, HOST VARCHAR2(255) NOT NULL, STATUS VARCHAR2(1 BYTE) NOT NULL, URLS_DONE INTEGER NOT NULL, URLS_ERROR INTEGER NOT NULL ) LOGGING NOCOMPRESS NOCACHE NOPARALLEL MONITORING; CREATE UNIQUE INDEX PK_HOST_ID ON SPIDER_HOST (HOST_ID) LOGGING NOPARALLEL; ALTER TABLE SPIDER_HOST ADD ( CONSTRAINT PK_HOST_ID PRIMARY KEY (HOST_ID)); -- Create Sequences CREATE SEQUENCE spider_workload_seq; CREATE SEQUENCE spider_host_seq;
To make use of Oracle, you will need the Oracle JDBC driver. There are several different drivers available for Oracle. Explaining all drivers is beyond the scope of this book. However, samples will be provided for the “thin” driver. This driver, along with other Oracle drivers, is contained in a Jar file that you will obtain when you set up the Oracle client. The name of this Jar file is as follows:
ojdbc14.jar
The name of the Jar file may change as new versions are released. However, you will need to add this file to the classpath when you run a spider session that requires Oracle. The driver that you must specify in your configuration file is as follows:
oracle.jdbc.driver.OracleDriver
Additionally, you will have to specify the database connection parameters in the database URL. This URL will look similar to the following:
jdbc:oracle:thin:@127.0.0.1:1532:database_name
A sample configuration file is shown in Listing F.6.
Listing F.6: Sample Spider Configuration for Oracle
timeout: 60000 maxDepth: -1 userAgent: corePoolSize: 100 maximumPoolSize:100 keepAliveTime: 60 dbURL: jdbc:oracle:thin:@127.0.0.1:1532:database_name dbClass: oracle.jdbc.driver.OracleDriver dbUID:username dbPWD:password workloadManager:com.heatonresearch.httprecipes.spider.workload.sql.oracle.OracleWorkloadManager startup: clear filter: com.heatonresearch.httprecipes.spider.filter.RobotsFilter
You will also notice that Oracle uses a specialized workload manager. This workload manager is named:
com.heatonresearch.httprecipes.spider.workload.sql.oracle.OracleWorkloadManager
This workload manager provides a few specialized SQL commands that Oracle requires.




