As a programmer of neural networks you must know what problems are adaptable to neural networks. You must also be aware of what problems are not particularly well suited to neural networks. Like most computer technologies and techniques often the most important thing learned is when to use the technology and when not to. Neural networks are no different.
A significant goal of this book is not only to show you how to construct neural networks, but also when to use neural networks. An effective neural network programmer knows what neural network structure, if any, is most applicable to a given problem. First, the problems that are not conducive to a neural network solution will be examined.
Programs that are easily written out as a flowchart are an example of programs that are not well suited to neural networks. If your program consists of well defined steps, normal programming techniques will suffice.
Another criterion to consider is whether the logic of your program is likely to change. The ability for a neural network to learn is one of the primary features of the neural network. If the algorithm used to solve your problem is an unchanging business rule there is no reason to use a neural network. It might be detrimental to your program if the neural network attempts to find a better solution, and begins to diverge from the expected output of the program.
Finally, neural networks are often not suitable for problems where you must know exactly how the solution was derived. A neural network can become very useful for solving the problem for which the neural network was trained. But the neural network can not explain its reasoning. The neural network knows because it was trained to know. The neural network cannot explain how it followed a series of steps to derive the answer.
Although there are many problems that neural networks are not suited for there are also many problems that a neural network is quite useful for solving. In addition, neural networks can often solve problems with fewer lines of code than a traditional programming algorithm. It is important to understand what these problems are.
Neural networks are particularly useful for solving problems that cannot be expressed as a series of steps, such as recognizing patterns, classifying into groups, series prediction and data mining.
Pattern recognition is perhaps the most common use for neural networks. The neural network is presented a pattern. This could be an image, a sound, or any other sort of data. The neural network then attempts to determine if the input data matches a pattern that the neural network has memorized. Chapter 3 will show a simple neural network that recognizes input patterns.
Classification is a process that is closely related to pattern recognition. A neural network trained for classification is designed to take input samples and classify them into groups. These groups may be fuzzy, without clearly defined boundaries. These groups may also have quite rigid boundaries. Chapter 7, “Applying to Pattern Recognition,” introduces an example program capable of Optical Character Recognition (OCR). This program takes handwriting samples and classifies them into the correct letter (e.g. the letter “A” or “B”).
The individual neurons that make up a neural network are interconnected through the synapses. These connections allow the neurons to signal each other as information is processed. Not all connections are equal. Each connection is assigned a connection weight. If there is no connection between two neurons, then their connection weight is zero. These weights are what determine the output of the neural network. Therefore, it can be said that the connection weights form the memory of the neural network.
Training is the process by which these connection weights are assigned. Most training algorithms begin by assigning random numbers to the weight matrix. Then the validity of the neural network is examined. Next, the weights are adjusted based on how valid the neural network performed. This process is repeated until the validation error is within an acceptable limit. There are many ways to train neural networks. Neural network training methods generally fall into the categories of supervised, unsupervised and various hybrid approaches.
Supervised training is accomplished by giving the neural network a set of sample data along with the anticipated outputs from each of these samples. Supervised training is the most common form of neural network training. As supervised training proceeds, the neural network is taken through several iterations, or epochs, until the actual output of the neural network matches the anticipated output, with a reasonably small error. Each epoch is one pass through the training samples.
Unsupervised training is similar to supervised training except that no anticipated outputs are provided. Unsupervised training usually occurs when the neural network is to classify the inputs into several groups. The training progresses through many epochs, just as in supervised training. As training progresses the classification groups are “discovered” by the neural network. Unsupervised training is covered in Chapter 7, “Applying Pattern Recognition”.
There are several hybrid methods that combine several aspects of supervised and unsupervised training. One such method is called reinforcement training. In this method the neural network is provided with sample data that does not contain anticipated outputs, as is done with unsupervised training. However, for each output, the neural network is told whether the output was right or wrong given the input.
It is very important to understand how to properly train a neural network. This book explores several methods of neural network training, including backpropagation, simulated annealing, and genetic algorithms. Chapters 4 through 7 are dedicated to the training of neural networks. Once the neural network is trained, it must be validated to see if it is ready for use.
Once a neural network has been trained it must be evaluated to see if it is ready for actual use. This final step is important so that it can be determined if additional training is required. To correctly validate a neural network, validation data must be set aside that is completely separate from the training data.
As an example, consider a classification network that must group elements into three different classification groups. You are provided with 10,000 sample elements. For this sample data the group that each element should be classified into is known. For such a system you would divide the sample data into two groups of 5,000 elements. The first group would form the training set. Once the network was properly trained the second group of 5,000 elements would be used to validate the neural network.
It is very important that a separate group always be maintained for validation. First training a neural network with a given sample set and also using this same set to predict the anticipated error of the neural network a new arbitrary set, will surely lead to bad results. The error achieved using the training set will almost always be substantially lower than the error on a new set of sample data. The integrity of the validation data must always be maintained.
This brings up an important question. What exactly does happen if the neural network that you have just finished training performs poorly on the validation set? If this is the case, then you must examine what, exactly, this means. It could mean that the initial random weights were not good. Rerunning the training with new initial weights could correct this. While an improper set of initial random weights could be the cause, a more likely possibility is that the training data was not properly chosen.
If the validation is performing badly this most likely means that there was data present in the validation set that was not available in the training data. The way that this situation should be solved is by trying a different, more random, way of separating the data into training and validation sets. If this fails, you must combine the training and validation sets into one large training set. Then new data must be acquired to serve as the validation data.
For some situations it may be impossible to gather additional data to use as either training or validation data. If this is the case then you are left with no other choice but to combine all or part of the validation set with the training set. While this approach will forgo the security of a good validation, if additional data cannot be acquired this may be your only alterative.