For the one of my PhD courses at Nova Southeastern University (NSU) it was necessary to reproduce the research of the following paper:
I. Ahmad, A. Abdullah, and A. Alghamdi, “Application of artificial neural network in detection of probing attacks,” in IEEE Symposium on Industrial Electronics Applications, 2009. ISIEA 2009., vol. 2, Oct 2009, pp. 557–562.
This paper demonstrated how to use a neural network to build a basic intrusion detection system (IDS) for the KDD99 dataset. It is important to reproduce research, in an academic setting. This means that you were able to obtain the same results as the original researchers, using the same techniques. I do this often when I write books or implement parts of Encog. This allows me convince myself that I have implemented an algorithm correctly, and as the researchers intended. I don’t always agree with what the original researcher did. If I change it, when I implement Encog, I am now in the area of “original research,” and my changes must be labeled as such.
Some researchers are more helpful than others for replication of research. Additionally,
neural networks are stochastic (they use random numbers). Basing recommendations off of
a small number of runs is usually a bad idea, when dealing with a stochastic system.
Their small number of runs caused the above researchers to conclude that two hidden layers was optimal for their dataset. Unless you are dealing with deep learning, this is almost always not the case. The universal approximation theorem rules out more than a single layer for the old-school sort of perceptron neural network used in this paper.
Additionally, the vanishing gradient problem prevents the RPROP training that the researchers from fitting well with larger numbers of hidden layers. The researchers tried up to 4 hidden layers.
For my own research replication I used the same dataset, with many training runs to make sure that their results were within my high-low range. To prove that a single layer does better I used ANOVA and Tukey’s HSD to show that differences among the different neural network architectures were indeed statistically significant and my box and whiskers plot shows that training runs with a single layer more consistently converged to a better mean RMSE.
I am attaching both my paper and code in case it is useful. This is a decent tutorial on using the latest Encog code to normalize and fit to a data set.
- Source code Includes: Source Code Link
- Python data prep script
- R code used to produce graphics and stat analysis
- Java code to run the training
- My report for download (PDF): Paper Link
- My report on ResearchGate: Link
The code is under LGPL, so feel free to reuse.