Problems in SOM Training in a Devnagari(Nepali) OCR

sujan's picture

I'm trying to develop an (Devnagari) Nepali OCR application. Line Detection, Word Detection, and Character segmentation is successfully done.NN is SOM, for this application. I trained with sample data like belows :

प:11111100011000111111000010000100001
म:11111010010100111111010010000100001
न:11111000011111110001100010000100001
स:11111010011111110111110010110100001
व:11111011111101110011111110000100001
ल:11111110011111110101110010100100001
क:11111001001111110101111010010000100
त:11111000011111110001110010110100101
त्र:11111111110001101111110010000100001

But The problem arising is, the characters are not correctly determined at recognition level. I looked through your OCR example and derived it for devnagari character recognition.

Here are some screenshots you may look at....

http://sujandhakal.com.np/view/a-nepali-ocr-:-problems/2-12-42.html

what may be the possible problem and how can i fix them..please reply soon...

SeemaSingh's picture

I've thought of trying to write an OCR app for Marathi, also Devnagari, but have not tried. Done correctly, the horizontal line could be quite useful for picking up lines of text.

My guess is that you might be downsampling too far. Devnagari is more complex than the latin alphabet that this example was written with, you might be losing too much detail in the downsample.

sujan's picture

ok let me try it by changing the downsampling width and height...one question... can we use feed forward network for OCR....i guess not....but what do u suggest?....My work is to compare Feed Forward Back Propagation NN and SOM(Kohenon) in a OCR, so please reply ok...


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.