How to troubleshoot my SOM

Andre's picture

Hi,

I've have a training set of 10 different instances. Each of the instances has 245 feature values.

I know that the feature set is of good quality as I trained a Support Vector Machine with this data and got perfect classification results.

But if I train a SOM with the following code it seems that the correct winner can never be determined:


this.input = new double[10][245];
..... filling the array .....

SOMPattern somPattern = new SOMPattern();
somPattern.setInputNeurons(245);
somPattern.setOutputNeurons(10);
this.network = somPattern.generate();
this.network.reset();

NeuralDataSet trainingSet = new BasicNeuralDataSet(this.input, null);
CompetitiveTraining train = new CompetitiveTraining(this.network, 0.25, trainingSet,
new NeighborhoodGaussian( new GaussianFunction(0, 1, 2)));

train.setForceWinner(true);

for(int epoch = 0; epoch < 100; epoch++){
train.iteration();
logger.info("Epoch #" + epoch + " Error:" + Format.formatPercent(train.getError()));
}

this.network = train.getNetwork();

During the iterations I can see that the lowest Errorrate is:
Epoch #11 Error:327,434577%

I've tried within a range from 50 to 5000 iterations.

However, even classifying a training instance for evaluation with the code


NeuralData neuralInput = new BasicNeuralData(features);
final int output = this.network.winner(neuralInput);

never retrieved the correct winner. It seems that my SOM is unable to learn the input data correctly.

So, where do I start to troubleshoot? Is the code correct? Is the error rate way to high?

BTW: If I try to generate a BasicNetwork with 245 - 245 - 1 the error rates are also noted between 1.300% and 120% and the network also never determines the correct output value.

André

jeffheaton's picture

Here are some things to try. First of all, the "error" is not a true error in the sense of supervised training. However, it should usually go down better than that, if the network is to be useful.

I would try thing, go ahead and train it as best you can. Both with force winners and not. Then submit the training set to it, and see how the winning neurons cluster. Ideally certain output neurons should not be dominating. That is the degree to which the network is learning.

If only a handful of neurons are winning, then somehow the data is not such that the network can categorize it into as many output neurons as you have specified.


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.