Import NeuralDataSet from CSV

Anton's picture

First I wanna say your project Encog is really amazing, I read already the GettingStarted PDF and look at the Encog API and some examples from the encog-examples project of the svn. Thank you for this great Library!

I try to use the CSVNeuralDataSet class to import an csv file looks like this: (2 inputs, 1 output; xor training)

0.0,0.0,0
0.0,1.0,1
1.0,0.0,1
1.0,1.0,0

and using this java code


File file = new File("buffer");
BufferedNeuralDataSet dataSet = new BufferedNeuralDataSet(file);
CSVNeuralDataSet csvDataSet = new CSVNeuralDataSet("blub.csv", 2, 1, false, CSVFormat.DECIMAL_POINT);
dataSet.load(csvDataSet);
long count = dataSet.getRecordCount();
System.out.println(count); // will print 4 (its right)
dataSet.beginLoad(2, 1);
dataSet.endLoad();
NeuralDataPair pair = null;;
dataSet.getRecord(0, pair); // get an NullPointerException here

I get an NullPointerException on the last line. Whats wrong? Why it does not work?

I have OS X 10.6.1 and Java 1.6

Anton's picture

I found a code snippet from the function EncogUtility.convertCSV2Binary()

But it also don't work

There are no elements, only null elements


File csvFile = new File("blub.csv");
File binFile = new File("buffer");
int inputCount = 2;
int outputCount = 1;
binFile.delete();
CSVNeuralDataSet csv = new CSVNeuralDataSet(csvFile.toString(), inputCount, outputCount, false);
BufferedNeuralDataSet buffer = new BufferedNeuralDataSet(binFile);
buffer.beginLoad(50, 6);
for (final NeuralDataPair pair : csv) {
buffer.add(pair);
}
buffer.endLoad();

jeffheaton's picture

First of all, if the CSV file will fit into memory, it is best to load it into a BasicNeuralDataSet. This will run the fastest. The BufferedNeuralData set is your only choice, if you have a really large dataset, but it is slow.

If you are using Encog 2.5, you can use a utility class that we just added called ImportExportUtility. It can be used to load a CSV file directly into memory. Or if you want to use 2.4, the code to do this are...

public static void importCSV(final NeuralDataSet set,
final int inputSize,
final int idealSize,
final InputStream istream,
final boolean headers,
final CSVFormat format) {

int line = 0;

final ReadCSV csv = new ReadCSV(istream, false, format);

while (csv.next()) {
line++;
BasicNeuralData input = null, ideal = null;

if (inputSize + idealSize != csv.getColumnCount()) {
throw new EncogError("Line #" + line + " has "
+ csv.getColumnCount()
+ " columns, but dataset expects "
+ (inputSize + idealSize) + " columns.");
}

if (inputSize > 0) {
input = new BasicNeuralData(inputSize);
}
if (idealSize > 0) {
ideal = new BasicNeuralData(idealSize);
}

final BasicNeuralDataPair pair = new BasicNeuralDataPair(input,
ideal);
int index = 0;

for (int i = 0; i < inputSize; i++) {
if (input != null)
input.setData(i, csv.getDouble(index++));
}
for (int i = 0; i < idealSize; i++) {
if (ideal != null)
ideal.setData(i, csv.getDouble(index++));
}

set.add(pair);
}
csv.close();
}

If you want to actually use the buffered neural data set, directly from disk, you might want to take a look at the forest cover example. It works this way. From the above code, the only thing I see wrong, off hand, is

buffer.beginLoad(50, 6);

That needs to be the input and ideal counts, which for XOR would be:

buffer.beginLoad(2, 1);

Anton's picture

Thanks!
I only found version 2.4.0 of Encog in the CSV. 2.5.0 is beta I think?

I used an own implementation for CSV reading/writing with the free library SuperCSV. For my purpose its more flexible. (validation, CSV format, decimal format)

jeffheaton's picture

2.5 is in beta, actually alpha, as we are still extending it. But it is stable, we try to not let the "mainline" get too for out of shape, generally.

You can always get the very latest versions from here:

http://build.heatonresearch.com/cruisecontrol/

ronny's picture

in this version of importCSV(), the "headers" parameter is not being considered

This change should fix it:

- final ReadCSV csv = new ReadCSV(istream, false, format);
+ final ReadCSV csv = new ReadCSV(istream, headers, format);


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.