You are here

Howto get sub second MarketData tick times

Hey all,

This post is actually a fork of this thread. I'm playing around with encog in Clojure. So I took a look at the way encog-java loads tick data into it's system. There is a YahooFinanceLoader that pulls csv data from a URL. But it assumes that prices only have a daily granularity. Now, the encog-java system seems to have the concept of granularity going down to the second (see here). But all of it's market loaders and list of ticks, seem to stop at a time granularity of daily. See the LoadedMarketData source, which uses a daily-biased MarketDataType. Obviously, that's not enough if we want to calculate on a second or sub-second interval. Ultimately the YahooFinanceLoader will give us a list of LoadedMarketData, which assumes daily price ticks.

What I need to know is can I give the encog neural net a list of tick data that has second or sub-second intervals? I have a EURUSD_Ticks_Apr2012.csv file that I'll use for training. But I need to know how to get it into the system.

Any insights are appreciated.

Thanks
Tim

Neural Network Forums: 
jeffheaton's picture

At this point the lowest second is all that is supported by the market data set.

You could also use the temporal data set, which the market dataset is built upon. It is not really based on any time unit, just that data comes in a specific sequence.

twashing's picture

Hey Jeff,

Thanks for looking at this. I noticed there was a sequenceGrandularity in TemporalMLDataSet.java. That field is org.encog.util.time.TimeUnit, a Java enum whose lowest increment is seconds.

A) That sequenceGranularity seems to be deeply woven into TemporalMLDataSet's concept of time. Do you mean the BasicNeuralDataSet?

B) I'd like to add all the CSV data, capturing the datetime with each tick. Capturing all the inter-second ticks, I think, will give me a sense of momentum of the time series. But what I really want to understand is if and how encog considers the time intervals when making it's next prediction?

Thanks
Tim

jeffheaton's picture

You are welcome.

A good example would be the sunspots example. Really the sunspots are by year but it uses TemporalMLDataSet as just arbitrary units. There is no date logic being used, just sequence numbers (which just so happen to be years at this point).

Another option is TemporalWindowArray which just deals with a sequence of inputs, it does not matter what the time unit is, so long as they are uniform.

The way Encog does this is by time-boxing. For example if the input values were:

ABCDEFGHIJKLMNOPQ

And the input window were 4, the output window 1. This data would be broken down into:

ABCD -> E
BCDE -> F
CDEF -> G

and so on. To use a sliding window, which is what the TemporalMLDataSet and MarketMLDataSet are based on, the ticks need to be of a uniform time length. The only reason the granularity is specified is so that data downloaded from Yahoo or CSV can be used.

Sliding window might not work so well for subsecond ticks. Because, what I think you are dealing with is very irregular spaced tick-quotes coming in between minute or second summary bars. I've thought some about how to represent this sort of data but have not yet implemented anything.

One idea that I've been meaning to play with (which is totally outside of the sliding window model) is to feed the ML method (neural network, SVM, etc) a window of two values. Namely the direction and magnitude of the last two pip's, capped at some arbitrary min/max up/down. And the amount of time... maybe miliseconds... between the last two ticks. That would be the pair, and then window that over the last say 20 pairs. Have not experimented with that, just a starting point for something I was going to experiment with and refine.

The trick of neural networks, SVM's, and the others is that they have a fixed input count. That is not to say that they can't take irregular spaces data, you just have to come up with encoding schemes, like the somewhat simple one I just mentioned.

twashing's picture

Sorry for the delay in responding. For some reason, I didn't get a notice that this thread was updated.

So for my data, the time intervals between ticks will definitely not be uniform. It sounds like the Sunspot example won't, as it requires uniform time intervals, even if the interval length is not specified.

A) I too, am thinking about ways to represent ticks in a neural net, that tracks time series. I'm going to experiment with some data points, that I'll try putting as input neurons (in addition to bid, ask, volume, etc), into the input layer.

i) time change as an input vector, used as an input neuron
ii) running average ( 5, 10, or 15 ticks, etc ) instead of a sliding time window, used as an input neuron

I'll definitely let you know how this goes, as I'd love to do what I can to advance encog. For the learning mechanism though, I have 2 other challenges to overcome

B) Predicting two data points, bid & ask, not just one. So in that CSV file, the input values are: time, bid, ask, bvolume, avolume. I want the output layer neurons to predict both bid and ask, not just one. The idea I have to solve this is to put a second weighted connection between the hidden and input layers. Usually there's 1 connection between each neuron in the hidden and input layer. That connection can be thought of as a synapse, and the hidden neuron weights each of the inputs based on how well they influence the final price prediction. I'm guessing that having 2 connections (and associated weights) between a hidden and input neuron, will give me separate and correct weighted predictions for both bid and ask prices. But of course I need to try it out.

C) Both of my suggestions abouve, fall outside of the encog codebase. So I'll have to implement my own FeedForward neural net (neurons, layers, etc). The real challenge though, will be to correctly implement a Resilient Propagation algorithm (on the advice of the codebase) for this neural net. Let me know if you have any ideas on approach; ie, whether to use the RPROP+ or iRPORP+ algorithm, etc.

Any insights on any 3 of these points is welcome.

Thanks
Tim

twashing's picture

This is with regards to actually implementing the iPROP+ algorithm (C, from last post). I have a few more detailed questions.

  • Where does the gradient value come from? If I'm correct, each hidden neuron will have i) a weight for each of its input neurons, and ii) one bias. Is the gradient, the weight?
  • Where do the currentError and lastError come from? Is each neuron's currentError, the bias?
  • If change is 0, then the weightChange will also be 0?

Thanks again
Tim

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer