fxmozart's picture

Hi Jeff;

I have a problem, maybe you can help on how to solve this.

I have 1 Client that makes a request to a "Master" encog server (Starting Remoting Encog)

Master encog makes training and sends back a network ...Fine.

But being that i want to split training into x number of computers (running on x number of Tasks, in x nb of cpu's etc..)

So:

Result i have for e.g:

10 Mini trainings (each for a week) :

So the total network analysis length should be 10 weeks but here i m receiving 10 networks of 1 week analysis of lenght..

Should i just average the values ?
How should i put this networks of 1 week (each on a week 1, 2 , 3 , x) back into one network of 10?

Or should i make like the sample i posted a while ago , where i average networks, for a final network ?

That would solve the EncogRemoting , i m working on ..

Tx.

jeffheaton's picture

Just checked out the other thread. Sounds very interesting.

Not sure I totally follow this. But I believe you want to essentially merge two neural networks together, that have the training of each?

Basically because you split the training data, by weeks. Then trained each neural network separately, then need to bring it all back together?

MarcF's picture

Sounds like the desire to further 'parrallelise' the training step, i.e. one computer takes 10 weeks to train a network, how would the task have to be correctly broken down so that 10 computers could do the same job in 1 week?

Mx

SeemaSingh's picture

Actually merging the learning from two neural networks is not trivial. It is not impossible either. Here are some thoughts.

1) Maybe keep the neural networks separate, and average the results back somehow. Maybe even do some sort of weighted average, where the neural networks with older data exponentially or logarithmically become less significant.

2) Merge the results with some sort of average of the weights between the two networks.

This will only work to a limited degree. The learning of one neuron is not autonomous to the others. At a minimum you have two transfer layers (input->hidden1) and (hidden1->output). What the 2nd transfer layer knows is very much dependent on what the first transfer layer gives it. The further apart the two networks are the less successful an average will be.

Earlier versions of Encog, to allow multi-threading, did use averaging. Basically to get the training by each of the cores back in sync. Each core had one part of the training data. The more training iterations that went by between syncs the less effective the merge was.

If you were to try and merge two neural networks that started from different random weight sets, after they've been iterated on for many many hours. I doubt that the result would be useful.

As far as how to merge two neural networks "mathematically", I could not find any current research. If I were to do it, I would start by calculating a Hessian (2nd order derivative) on each of the neural networks against both their local and global training data. I would use this to determine which connections(and even neurons) were useful. Using these two together I would both prune and combine (somehow) and try to create some sort of composite network. It would take some experimentation.

3) The only current research on merging, that I could find is cross-over from genetic algorithms
Basically in a GA, two neural networks "mate" and produce an offspring. There are two main ways this is done. The classical approach (used by Encog's GA trainer) basically lines the weights up as a double-array and takes several "splices" from each parent and patches them together into a child. The child knows some of what each parent does.

But, if the two parents are too different, the child is going to be useless. We really need to add "speciazation" to the classic GA algorithm, this means that only similar networks mate. Otherwise its like cutting and pasting the DNA of an Elephant with a Housecat, and implanting it in the female as a surrogate. If the "baby" even "survives" to term, it probably won't live long or be well suited.

To use the GA approach you would take your neural networks and add them all to a population, allow them to mate for a number of generations and see what shakes out. Not sure this will work so well, because your population will be small, and very different.

Just a few ideas. Basically, you are probably going to need some communication between the nodes so that they never get too far apart.

fxmozart's picture

Just checked out the other thread. Sounds very interesting.

Not sure I totally follow this. But I believe you want to essentially merge two neural networks together, that have the training of each?

Basically because you split the training data, by weeks. Then trained each neural network separately, then need to bring it all back together?

-----------------

Yes , exactly..

So right now i built a starting IEncogFrameworkRemote.. (I ll put it in the git, when it looks like something).

3 "tools":

1 client console.
1 IEncog (interface)
2 Server app

Client requests a training on a data (Lets say 10 year of YHOO daily data (to make it simple)).

Request is sent to server(s) via the interface....

Server (n) receives 1 month with 5 days inputs and 1 output and starts doing it s work...

When they are all finished , they send back their networks of 1 month (looks like protobuf should be good for this, haven't got to use it yet though).

So now i have 12 networks , each trained on a particular month..

I could use a dictionnary (like the little console i submitted awhile back) that takes all the networks averages the outputs?

Regarding what Sima said , i have the 12 networks ,and make a new GA give it the 12 networks and let it work ?

There is no "Math" way to just stitch back the 12 networks into one without using the GA? GA will make a completely new network , which has to work locally ..Kinda defeats the parallelism..

-----

So , if the nodes "talk" to each other, maybe at the end of each run, each Task (a network being calculated) , waits for the results of all others (Task.Factory.WaitAll )) ...and aggregates the results in..I guess that also changes the networks too..
-------
I thought i could somehow do something like var Finalnetwork = Server.GetNetworks().Stitch();

Which would take the List calculatedNetworks (on the server memory) and somehow stitch them to be exactly the same if i had them run without parallelizing code at all..

The programming looks ok , the math looks harder :/

fxmozart's picture

Seema :

I see you are added Sequences to Encog...

Can't i take sequences to stitch back the neural trainings / networks?

Geoffroy's picture

Interesting problem... Another way to solve it would be to have a single neural network trained in parallel by your slaves. Each slave has its own split of the training set and after each epoch it sends its error value and gradients to the master node which collects them all to update the weights. The master sends the new weights to the slaves and they can start the next epoch.

More details here:
http://www.dcs.bbk.ac.uk/~gmagoulas/Distributed_computing_neural.pdf

I'm not sure how the GA solution suggested above would work in practice. If your initial population is your 12 pre-trained networks, you still have to run GA on them for many generations before they converge to a single solution. And at each generation you would have to evaluate the fitness of the new offsprings. Which training set would you then use to evaluate the fitness? If you use the entire year it might become quite expensive and then you might just as well do the entire training with GA. The advantage of GA however is that it is very amenable to parallel processing and you'll find a lot of well documented network topologies and cross-fertilisation methods to do it.

If you stick with the approach where you combine the output from your ensemble/committee of 12 specialised networks there's also a lot of techniques to do so (see http://www.scholarpedia.org/article/Ensemble_learning). But since your data is split chronologically and each network specialised in a single month I'd be interested to know how good the results are. You might need to tune the relative weights of each network output to get better predictions. Perhaps it is possible to split your training set in such a way that each subset is more uniformly spread over the entire year.

G.


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.