Time Series Prediction with LSTM and Keras for Multiple Steps Ahead

In this post I will share experiment with Time Series Prediction with LSTM and Keras. LSTM neural network is used in this experiment for multiple steps ahead for stock prices data. The experiment is based on the paper [1]. The authors of the paper examine independent value prediction approach. With this approach a separate model is built for each prediction step. This approach helps to avoid error accumulation problem that we have when we use multi-stage step prediction.

LSTM Implementation

Following this approach I decided to use Long Short-Term Memory network or LSTM network for daily data stock price prediction. LSTM is a type of recurrent neural network used in deep learning. LSTMs have been used to advance the state-of the-art for many difficult problems. [2]

For this time series prediction I selected the number of steps to predict ahead = 3 and built 3 LSTM models with Keras in python. For each model I used different variable (fit0, fit1, fit2) to avoid any “memory leakage” between models.
The model initialization code is the same for all 3 models except changing parameters (number of neurons in LSTM layer)
The architecture of the system is shown on the fig below.

Multiple step prediction with separate neural networks
Multiple step prediction with separate neural networks

Here we have 3 LSTM models that are getting same X input data but different target Y data. The target data is shifted by number of steps. If model is forecasting the data stock price for day 2 then Y is shifted by 2 elements.
This happens in the following line when i=1:

yt_ = yt.shift (-i - 1  ) 

The data were obtained from stock prices from Internet.

The number of unit was obtained by running several variations and chosen based on MSE as following:

   
    if i==0:
        units=20
        batch_size=1
    if i==1:
        units=15
        batch_size=1
    if i==2:
         units=80
         batch_size=1

If you want run more than 3 steps / models you will need to add parameters to the above code. Additionally you will need add model initialization code shown below.

Each LSTM network was constructed as following:


 if i == 0 :
          fit0 = Sequential ()
          fit0.add (LSTM (  units , activation = 'tanh', inner_activation = 'hard_sigmoid' , input_shape =(len(cols), 1) ))
          fit0.add(Dropout(0.2))
          fit0.add (Dense (output_dim =1, activation = 'linear'))
          fit0.compile (loss ="mean_squared_error" , optimizer = "adam")  
   
          fit0.fit (x_train, y_train, batch_size =batch_size, nb_epoch =25, shuffle = False)
          train_mse[i] = fit0.evaluate (x_train, y_train, batch_size =batch_size)
          test_mse[i] = fit0.evaluate (x_test, y_test, batch_size =batch_size)
          pred = fit0.predict (x_test) 
          pred = scaler_y.inverse_transform (np. array (pred). reshape ((len( pred), 1)))
             # below is just fo i == 0
          for j in range (len(pred)) :
                   prediction_data[j] = pred[j] 

For each model the code is saving last forecasted number.
Additionally at step i=0 predicted data is saved for comparison with actual data:

prediction_data = np.asarray(prediction_data)
prediction_data = prediction_data.ravel()

# shift back by one step
for j in range (len(prediction_data) - 1 ):
    prediction_data[len(prediction_data) - j - 1  ] =  prediction_data[len(prediction_data) - 1 - j - 1]

# combine prediction data from first model and last predicted data from each model
prediction_data = np.append(prediction_data, forecast)

The full python source code for time series prediction with LSTM in python is shown here

Data can be found here

Experiment Results

The LSTM neural network was running with the following performance:

train_mse
[0.01846262458218137, 0.009637593373373323, 0.0018845983509225203]
test_mse
[0.01648362025879952, 0.026161141224167357, 0.01774421124347165]

Below is the graph of actual data vs data testing data, including last 3 stock data prices from each model.

Multiple step prediction actual data vs predictions
Multiple step prediction – actual data vs predictions

Accuracy of prediction 98% calculated for last 3 data stock prices (one from each model).

The experiment confirmed that using models (one model for each step) in multistep-ahead time series prediction has advantages. With this method we can adjust parameters of needed LSTM for each step. For example, number of neurons for i=2 was modified to decrease prediction error for this step. And it did not affect predictions for other steps. This is one of machine learning techniques for stock prediction that is described in [1]

References
1. Multistep-ahead Time Series Prediction
2. LSTM: A Search Space Odyssey
3. Deep Time Series Forecasting with Python: An Intuitive Introduction to Deep Learning for Applied Time Series Modeling



11 thoughts on “Time Series Prediction with LSTM and Keras for Multiple Steps Ahead

  1. Hi!
    I made an LSTM model and fed to it the ABS(SIN(X)) data saved in CSV data file. Before I’ll train LSTM with that data, I replaced the rest 100 numbers of dataset with random nubers to see how the LSTM predict that sudden pattern change. I split the data to train and test sets in ratio of 0.1 (first 10% of data is train data and the rest 90% is test data), so the model couldn’t see the sudden switch from my function to the random noise that appears in the last 100 numbers of dataset.
    But prediction shows exact the same random data as it is in the datafile, i.e. with that sudden switch to random data! I should see only my function that was in the trainset, but I saw both my function and that random data.
    How could it be possible? What am I doing wrong?
    Here is my program: https://pastebin.com/MTSy4r7m
    Here is my CSV4.CSV datafile listing: https://pastebin.com/q87qEaiG

    • Hi Maxim,
      I looked at your code and data file, if I am not mistaken, it looks like you inputting to network same data for x and target y. Looks like you just assigning data_input to expected output. So this way you train network to reproduce input – and so you are getting what you enter.
      I think you need to have 2 column data :
      one column is x like 0.1, 0.2, ……
      and 2nd column (target data, y) is calculated based on x from first column =ABS(SIN(X))
      You can then add some random data to column y to see how network will perform.
      Hope this helps.

      • Hi, owygs156!
        Thank you for your response!
        I have such a piece of code inside my program:
        expected_output = data_input[1:]

        I intended to shift input dataset by one, so my expected data should be the next value of dataset. For example, for the dataset of 1,2,3,4,5 etc, expected outputs should be 2,3,4,5,6, i.e. expected outputs are just the values of my dataset shifted by one step ahead.
        Is that right?

  2. Hi Maxim,
    yes this is correct. Sorry I missed this. Yes this should work. Regarding your original question – predictions are not really the same as inputted random data – if you use 5 neurons for LSTM, the predictions will different for statefull.
    Thanks.

  3. Hi – could you explain a little bit about this process mentioned here? “The number of unit was obtained by running several variations and chosen based on MSE as following:”

    Was this a basic grid search within a range of number of neurons?

  4. Hi,
    I have a question around the shifting of the target data. In the code you do:
    yt_ = yt.shift (i – 1 )

    This will result in the following shifts:
    yt_ = yt.shift ( – 1 ), yt_ = yt.shift (0), and yt_ = yt.shift (1 ) if steps_to_predict = 3

    But this is where I get lost. In the text you write: “for day 2 then Y is shifted by 2 elements. This happens in the following line when i=1: ”
    But when “i=1”, I don’t see any shift at all.

    Thanks

  5. Hi Leif,
    sorry for the confusion here, It is actually should be
    yt_ = yt.shift (-i – 1 )
    So for 0 it will be yt.shift ( – 1 ) as before , for i=1 it will be yt.shift(-2) and for i=2 it will be yt.shift(-3)

    Regarding the note “for day 2 …” I mean here that day 2 matches to i=1 and not i=2 since we count i starting from 0.

    Thanks for catching this.

    • Hi,

      Yes I thought so, so I did the change.

      And thank you for sharing the code! I’ve modified it a little bit, so that I’m running on live data, and I’ve also added an extra graph at the end, using plotly offline.

      On daily charts it looks very good, and on intraday it does the work but not as good as on daily charts.

      >>>>>>>>>>>>>>>> Added code
      import plotly.offline as py
      import plotly.graph_objs as go

      predictDates = data.tail(len(x_test_all)).index

      actual_chart = go.Scatter(x=predictDates, y=x_test_all, name= ‘Actual’)
      predict_chart = go.Scatter(x=predictDates, y=prediction_data, name= ‘Predictions’)

      layout = go.Layout(xaxis = dict(type=’category’), width=1020, height=800, title = ‘LSTM Forecast For ‘ + str(symbol) + ‘ – Predict Steps = ‘ + str(steps_to_predict) + ‘; Data to Use = ‘ + str(data_to_use))
      fig = dict(data=[predict_chart, actual_chart], layout=layout)

      py.iplot(fig, image = ‘png’, filename=’LSTM-forecast-‘ + str(symbol) + ‘-‘ + str(steps_to_predict) + ‘-steps-to-predict_’ + str(data_to_use))
      <<<<<<<<<<<<<<<<<<<<<<<<

      Thanks

  6. Hi, owygs156,

    can you pls elaborate more on the lines:

    # shift back by one step
    for j in range (len(prediction_data) – 1 ):
    prediction_data[len(prediction_data) – j – 1 ] = prediction_data[len(prediction_data) – 1 – j – 1]

    # combine prediction data from first model and last predicted data from each model
    prediction_data = np.append(prediction_data, forecast)

    What is the rationale behind shift back and combining? It looks like adding apples to peaches. Can’t understand this. Please help.

Leave a Comment