Using org.encog.neural.data.sql.SQLNeuralDataSet

Dan's picture

Hi,
Firstly, I'd like to say that i'm very impressed by your project, and that I've got a lot of great use out of it already.
I have encountered a small problem using the SQLNNeuralDataSet. I have a large amount of training data that will not fit in memory, that i want to train on using this class. However when i try and run it I find that it runs out of memory. I went to the documentation and found this:

"A dataset based on a SQL query. This is not a memory based dataset, so it can handle very large datasets without a memory issue. This class makes use of JDBC to query the database. If you are running into "out of memory" issues with this class try setting a lower "fetch size". This can be done with: sqlDataSet.getStatement().setFetchSize(1000); "

However, on adding " sqlDataSet.getStatement().setFetchSize(1000); " this did not solve the problem. I am not sure if this is to do with my database and its settings or Encog. I looked at the source code for the SQLneuraldataset and found:

"this.results = SQLNeuralDataSet.this.statement.executeQuery();"

which I'm going to assume executes the entire query, and causes my problem. Am I using this incorrectly? Or is this the expected behavour?

Thanks,

Dan

jeffheaton's picture

We've encountered that problem before. I added the fetch size, and that seemed to help the person who was running into the same sort of issue before. However, that call just goes right to the JDBC driver and from what I've read is not always implemented properly. I see two options.

1. Use a "UnionNeuralDataSet". Then use several SQL queries to pull back your dataset into smaller sizes. The union dataset connects several smaller datasets into one large one. For example, one SQLNeuralDataSet for each of your queries and one UnionNeuralDataSet to link them together.

2. When I have large datasets, that do not fit into memory, I typically use the BufferedNeuralDataSet. It reads from a "binary file" located on the disk. It is very fast. There is a utility method provided that will convert a CSV file into a binary file for use with this dataset.

EncogUtility.convertCSV2Binary

The forest cover example makes use of this technique.

Jeff

Dan's picture

Great, thanks a lot, I'll try those two different methods.


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.