Solving Problems with Neural Networks
A significant goal of this book is to show you how to construct neural networks and to teach you when to use them. As a programmer of neural networks, you must understand which problems are well suited for neural network solutions and which are not. An effective neural network programmer also knows which neural network structure, if any, is most applicable to a given problem. This section begins by first focusing on those problems that are not conducive to a neural network solution.
Problems Not Suited to a Neural Network Solution
Programs that are easily written out as flowcharts are examples of problems for which neural networks are not appropriate. If your program consists of well-defined steps, normal programming techniques will suffice.
Another criterion to consider is whether the logic of your program is likely to change. One of the primary features of neural networks is their ability to learn. If the algorithm used to solve your problem is an unchanging business rule, there is no reason to use a neural network. In fact, it might be detrimental to your application if the neural network attempts to find a better solution, and begins to diverge from the desired process and produces unexpected results.
Finally, neural networks are often not suitable for problems in which you must know exactly how the solution was derived. A neural network can be very useful for solving the problem for which it was trained, but the neural network cannot explain its reasoning. The neural network knows something because it was trained to know it. The neural network cannot explain how it followed a series of steps to derive the answer.
Problems Suited to a Neural Network
Although there are many problems for which neural networks are not well suited, there are also many problems for which a neural network solution is quite useful. In addition, neural networks can often solve problems with fewer lines of code than a traditional programming algorithm. It is important to understand which problems call for a neural network approach.
Neural networks are particularly useful for solving problems that cannot be expressed as a series of steps, such as recognizing patterns, classification, series prediction, and data mining.
Pattern recognition is perhaps the most common use for neural networks. For this type of problem, the neural network is presented a pattern. This could be an image, a sound, or any other data. The neural network then attempts to determine if the input data matches a pattern that it has been trained to recognize. Chapter 3, Using a Hopfield Neural Network, provides an example of a simple neural network that recognizes input patterns.
Classification is a process that is closely related to pattern recognition. A neural network trained for classification is designed to take input samples and classify them into groups. These groups may be fuzzy, lacking clearly defined boundaries. Alternatively, these groups may have quite rigid boundaries. Chapter 12, OCR and the Self-Organizing Map, introduces an example program capable of optical character recognition (OCR). This program takes handwriting samples and classifies them by letter (e.g., the letter “A” or “B”).
Training Neural Networks
The individual neurons that make up a neural network are interconnected through their synapses. These connections allow the neurons to signal each other as information is processed. Not all connections are equal. Each connection is assigned a connection weight. If there is no connection between two neurons, then their connection weight is zero. These weights are what determine the output of the neural network; therefore, it can be said that the connection weights form the memory of the neural network.
Training is the process by which these connection weights are assigned. Most training algorithms begin by assigning random numbers to a weights matrix. Then, the validity of the neural network is examined. Next, the weights are adjusted based on how well the neural network performed and the validity of the results. This process is repeated until the validation error is within an acceptable limit. There are many ways to train neural networks. Neural network training methods generally fall into the categories of supervised, unsupervised, and various hybrid approaches.
Supervised training is accomplished by giving the neural network a set of sample data along with the anticipated outputs from each of these samples. Supervised training is the most common form of neural network training. As supervised training proceeds, the neural network is taken through a number of iterations, or epochs, until the output of the neural network matches the anticipated output, with a reasonably small rate of error. Each epoch is one pass through the training samples.
Unsupervised training is similar to supervised training, except that no anticipated outputs are provided. Unsupervised training usually occurs when the neural network is being used to classify inputs into several groups. The training involves many epochs, just as in supervised training. As the training progresses, the classification groups are “discovered” by the neural network. Unsupervised training is covered in chapter 11, Using a Self-Organizing Map.
There are several hybrid methods that combine aspects of both supervised and unsupervised training. One such method is called reinforcement training. In this method, a neural network is provided with sample data that does not contain anticipated outputs, as is done with unsupervised training. However, for each output, the neural network is told whether the output was right or wrong given the input.
It is very important to understand how to properly train a neural network. This book explores several methods of neural network training, including backpropagation, simulated annealing, and genetic algorithms. Chapters 4 through 7 are dedicated to the training of neural networks. Once the neural network is trained, it must be validated to see if it is ready for use.
Validating Neural Networks
The final step, validating a neural network, is very important because it allows you to determine if additional training is required. To correctly validate a neural network, validation data must be set aside that is completely separate from the training data.
As an example, consider a classification network that must group elements into three different classification groups. You are provided with 10,000 sample elements. For this sample data, the group that each element should be classified into is known. For such a system, you would randomly divide the sample data into two groups of 5,000 elements each. The first group would form the training set. Once the network was properly trained, the second group of 5,000 elements would be used to validate the neural network.
It is very important that a separate group of data always be maintained for validation. First, training a neural network with a given sample set and also using this same set to predict the anticipated error of the neural network for a new arbitrary set will surely lead to bad results. The error achieved using the training set will almost always be substantially lower than the error on a new set of sample data. The integrity of the validation data must always be maintained.
This brings up an important question. What happens if the neural network that you have just finished training performs poorly on the validation data set? If this is the case, then you must examine possible causes. It could mean that the initial random weights were not appropriate. Rerunning the training with new initial weights could correct this. While an improper set of initial random weights could be the cause, a more likely possibility is that the training data was not properly chosen.
If the validation is performing poorly, it is likely that there was data present in the validation set that was not available in the training data. The way this situation should be rectified is to try a different random approach to separating the data into training and validation sets. If this fails, you must combine the training and validation sets into one large training set. New data must then be acquired to serve as the validation data.
In some situations it may be impossible to gather additional data to use as either training or validation data. If this is the case, then you are left with no other choice but to combine all or part of the validation set with the training set. While this approach will forgo the security of a good validation, if additional data cannot be acquired this may be your only alternative.
