I am having trouble wrapping my head around certain aspects of the Keras implementations of RNN. This is a description of my problem:
The observations each have 200 features, and varying time steps from 1 - 730 days.
I have labels for each day for each of these observations. So its like a sequence to sequence time series regression problem. The data is daily customer activity and the label is customer spend (regression).
I want to make predictions for the next X days, basically for any arbitrarily length sequence I want into the future.
Here are my questions:
Preparing training data:
I know the input dimension to RNN is (nb_samples, timesteps, input_dim) which in this case for me will be (# of customers, something between 1 - 730, 200)
Model Architecture:
Making Predictions:
If your issue is an implementation question, please ask your question on StackOverflow or join the Keras Slack channel and ask there instead of filing a GitHub issue.
@unrealwill something got bungled in your numbering. can you reformat? I actually would like to share your answer with everyone that I'm working with
@unrealwill when you say do greedy predictions, what if I don't have any features for the next time step how do I make predictions? Does my model have to also output predictions for the features at the next timestep as well? This is not a character - rnn problem where the outptut can be directly be fed as input at the next time step...
I've reformatted, my answer, thx.
If you don't have the features for the next time step, you should probably build a model for them.
The generic way is character-rnn style but with a mixture of gaussians model (vae style), which you usually can start approximating with a single gaussian . Or you can make an hypothesis that features are independent from the current sequence, and sample from them directly when you need some.
Alternatively you can learn multiple model (one for each x) which learns to predict label (probably the cumulative sales between t and t+x) at t + x directly.
@unrealwill so just to confirm, I should NOT mask my input values (which you suggested I should right-pad), correct? What is the practical difference in masking input vs. output? If I mask the input, my understanding is that the mask will propagate all the way downstream... ? What am I missing?
I used to mask the output with sample weights in temporal mode before masks in Keras got implemented and didn't change my habit. There are various ways to implement masks, (masks are quite tricky), I checked the keras code regarding masks yesterday (see https://github.com/fchollet/keras/issues/5392), and masks indeed propagate downstream, to be applied at the output level, so you can use keras mask of the input values and it will be propagated to the output where the loss of the masked values will be ignored.
@unrealwill I have timeseries data and the same problem with the seq length. I train with a smaller part of the data (10000,20,1) shape and the whole data is (20000,20,1). Should I ad 10000 rows with zeros beginning at the top into my training data ? My aim usually was to train with less data but use the whole data for prediction
Most helpful comment
If your issue is an implementation question, please ask your question on StackOverflow or join the Keras Slack channel and ask there instead of filing a GitHub issue.