For the one of my PhD courses at Nova Southeastern University (NSU) it was necessary to
reproduce the research of the following paper:
I. Ahmad, A. Abdullah, and A. Alghamdi, “Application of artificial neural network in
detection of probing attacks,” in IEEE Symposium on Industrial Electronics Applications,
2009. ISIEA 2009., vol. 2, Oct 2009, pp. 557–562.
This paper demonstrated how to use a neural network to build a basic intrusion detection
system (IDS) for the KDD99 dataset. It is important to reproduce research, in an academic
setting. This means that you were able to obtain the same results as the original
researchers, using the same techniques. I do this often when I write books or implement
parts of Encog. This allows me convince myself that I have implemented an algorithm
correctly, and as the researchers intended. I don’t always agree with what the original
researcher did. If I change it, when I implement Encog, I am now in the area of “original
research,” and my changes must be labeled as such.
Some researchers are more helpful than others for replication of research. Additionally,
neural networks are stochastic (they use random numbers). Basing recommendations off of
a small number of runs is usually a bad idea, when dealing with a stochastic system.
Their small number of runs caused the above researchers to conclude that two hidden layers
was optimal for their dataset. Unless you are dealing with deep learning, this is almost
always not the case. The universal approximation theorem rules out more than a single
layer for the old-school sort of perceptron neural network used in this paper.
Additionally, the vanishing gradient problem prevents the RPROP training that the
researchers from fitting well with larger numbers of hidden layers. The researchers
tried up to 4 hidden layers.
For my own research replication I used the same dataset, with many training runs to make
sure that their results were within my high-low range. To prove that a single layer does
better I used ANOVA and Tukey’s HSD to show that differences among the different neural
network architectures were indeed statistically significant and my box and whiskers plot
shows that training runs with a single layer more consistently converged to a better
I am attaching both my paper and code in case it is useful. This is a decent tutorial
on using the latest Encog code to normalize and fit to a data set.
- Source code Includes: Source Code Link
- Python data prep script
- R code used to produce graphics and stat analysis
- Java code to run the training
- My report for download (PDF): Paper Link
- My report on ResearchGate: Link
The code is under LGPL, so feel free to reuse.