Image Recognition Tips?

sesshomurai's picture

Hi,
I'm new to encog and NN's so pardon my extreme noobyness.
I liked the new image recognition examples. Worked nice.

However, I am trying to recognition parts of a larger image (B&W) by letting the user train with a rubberband, small sections (thumbnails).

Then, I try to detect also from the users rubberband. I am getting uncertain results and had some questions.

1. Is there a way to get the confidence value of a detected guess (I'm using the example code as a basis)?
2. When I train once on multiple 3+ thumbnails, the detection results change and become erroneous, compared to just 2.
Is it thus better to have separate networks specializing in recognizing 1 type of pattern vs. one network trained on many patterns?
3. My thumbnails will vary in size and this seems to throw occasional exceptions in the desampling classes regarding pixel size or color information. Is it possible to allow for variable sized training image sets and detection regions? Or does this technique work best with fixed sizes all around?
4. What are the most optimal settings to achieve the most accurate, high-resultion detection scheme? I notice a variety of variables in the example, but not sure how they affect my results. I don't care about processing time, etc.

Thanks for any tips. Much appreciated.

Darren

jeffheaton's picture

Glad the image examples are helping.

1. As far as confidence I usually use the value that the neural network is outputting. For example, if you had 50 images you were recognizing, you would have 50 output neurons. Confidence can be measured either by how high the winning neuron is, or looking at the relative range between the winning neuron and the other neurons.

2. Multiple networks may work better at recognizing different patterns. Some of the most complex neural networks for image detections, that I've read about, actually use quite a few neural networks that are each trained to recognize "features" in an image. Then a final neural network is fed the output from the feature detecting neural networks and it actually detects what sort of image it is.

3. Not sure about the varying sizes. I will have to try this with the image data set and see if anything needs to be fixed to provide better support.

4. Not sure which options you mean. If you want to ask about one in particular I can expand on it.

sesshomurai's picture

1. So let's say I have a large overhead image of some grassy fields and there are 4 junk cars (all alike) scattered throughout the image. I train a small rectangle on a "car", having only 1 output neuron -> "car".
Now, no matter what I give it to detect it thinks "car", as expected. But I was hoping to get a value like 0.01 to 0.90 depending on whether I give it a thumbnail of grass or a car (that it was trained on). Such as, how similar is the input to the winning neuron, as the input actually does vary.

This way, I can set a threshold of similarity to discard inputs that do not cross the threshhold (and they won't have trained neurons themselves).

2. Very interesting!

3. Yeah, there are numerous array out of bounds when trying to train and compare sets of varying sized thumbnails. I will post some more exact traces.

4. For example, how does varying StrategyCycle, Minutes and StrategyError affect my accuracy?

Thanks for responding, and for a really complete product.

Darren

jeffheaton's picture

1. Is a car currently the only thing you are training it with? If so you might need to train it with examples other than a car, otherwise it has nothing to contrast.

4. Minutes is how long to train for... the longer you train the lower the error will go. The StrategyError causes training to stop when the error hits a point, no matter how long it takes. StrategyCycles specifies how many training cycles it will allow the network to stagnate before it resets. Sometimes a network(from random values) is not trainable, and we have to give up at a certian point and start over with a new random weight matrix.

sesshomurai's picture

1. I trained with grass and car eventually. Yet sometimes it thought a random grass region was a car. It also seemed to produce more errors the more I trained for grass or cars. In other words, I seemed to get the best results by not over training. But we're only talking about less a dozen trains. So not sure about my results yet.

2. I tried longer times and also got some unexpected results and didn't see a noticeable improvement. However, I'm not yet conducting formal experiements. Just spot testing.

by the way, there is a bug in the down sample class where it uses image.width for both the width and height in the downsample (I forgot the exact line). This produces NPE's and erroneous results as well.


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.