forecasting time series data Archives - Machine Learning Applications

Machine Learning Stock Market Prediction with LSTM Keras

April 3, 2018April 5, 2018 by owygs156

In the previous posts [1,2] I created script for machine learning stock market price on next day prediction. But it was pointed by readers that in stock market prediction, it is more important to know the trend: will the stock go up or down. So I updated the script to predict difference between today and yesterday prices. If it is negative the stock price will go down, if positive it will go up. Below will be described implemented modifications.

Data inputting

Data from previous days are entered as features through additional columns. The number of columns can be changed through parameter N in the beginning of script. So for example for day 20 the input will contain data for day 21 as target and data for days 20, 19,18,17… 20-N
Also added differencing before scaling. Differencing helped to improve performance of network. It also makes easy to get changes from previous day. In the end the differenced data inverted back.

Below is the stock data prices after applying differencing (subtructing previous day stock data price from current day)

Predicting future changes

I used moving_test_window_preds function. Inside of this function the script within loop is adding new prediction to “moving window” array and removing first element from it. This is based on example from blog post on forecasting time series with LSTM[4].
So the script is predicting future day data based on the previous known data in the “moving window”, updating known data and starting again. The performance evaluated by comparing predicted data with test (not used before) data.

LSTM Configuration

The LSTM network is constructed as following:

model = Sequential ()
input_shape =(len(cols), 1) ))
model.add (LSTM ( 400,  activation = 'relu', inner_activation = 'hard_sigmoid' , bias_regularizer=L1L2(l1=0.01, l2=0.01),  input_shape =(len(cols), 1), return_sequences = False ))
model.add(Dropout(0.3))
from keras import optimizers
model.add (Dense (output_dim =1, activation = 'linear', activity_regularizer=regularizers.l1(0.01)))
adam=optimizers.Adam(lr=0.01, beta_1=0.89, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=True)
model.compile (loss ="mean_squared_error" , optimizer = "adam") 
history=model.fit (x_train, y_train, batch_size =1, nb_epoch =1400, shuffle = False, validation_split=0.15)

Weight regularization together with differencing helped to decrease overfitting.

Results

Performance of NN is 88% : 8 correct data out of 9. Below are the data charts that comparing predicted data (9 first days from 22 total days) with actual test data. Below you can find output chart and full python source code.

import numpy as np
import pandas as pd
from sklearn import preprocessing

import matplotlib.pyplot as plt
import matplotlib.ticker as mtick

from keras.regularizers import L1L2

fname="C:\\Users\\Leo\\Desktop\\A\\WS\\stock data analysis 2017\\GM.csv"
data_csv = pd.read_csv (fname)

#how many data we will use 
# (should not be more than dataset length )
data_to_use= 150

# number of training data
# should be less than data_to_use
train_end =120


total_data=len(data_csv)

#most recent data is in the end 
#so need offset
start=total_data - data_to_use


yt = data_csv.iloc [start:total_data ,4]    #Close price
yt_ = yt.shift (-1)   

print (yt_)

data = pd.concat ([yt, yt_], axis =1)
data. columns = ['yt', 'yt_']


N=18    
cols =['yt']
for i in range (N):
  
    data['yt'+str(i)] = list(yt.shift(i+1))
    cols.append ('yt'+str(i))
    
data = data.dropna()
data_original = data
data=data.diff()
data = data.dropna()
    
    
# target variable - closed price
# after shifting
y = data ['yt_']
x = data [cols]

   
scaler_x = preprocessing.MinMaxScaler ( feature_range =( -1, 1))
x = np. array (x).reshape ((len( x) ,len(cols)))
x = scaler_x.fit_transform (x)

scaler_y = preprocessing. MinMaxScaler ( feature_range =( -1, 1))
y = np.array (y).reshape ((len( y), 1))
y = scaler_y.fit_transform (y)

    
x_train = x [0: train_end,]
x_test = x[ train_end +1:len(x),]    
y_train = y [0: train_end] 
y_test = y[ train_end +1:len(y)]  

x_train = x_train.reshape (x_train. shape + (1,)) 
x_test = x_test.reshape (x_test. shape + (1,))

from keras.models import Sequential
from keras.layers.core import Dense
from keras.layers.recurrent import LSTM
from keras.layers import  Dropout
from keras import optimizers

from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)

from keras import regularizers


model = Sequential ()
model.add (LSTM ( 400,  activation = 'relu', inner_activation = 'hard_sigmoid' , bias_regularizer=L1L2(l1=0.01, l2=0.01),  input_shape =(len(cols), 1), return_sequences = False ))
model.add(Dropout(0.3))
model.add (Dense (output_dim =1, activation = 'linear', activity_regularizer=regularizers.l1(0.01)))
adam=optimizers.Adam(lr=0.01, beta_1=0.89, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=True)
model.compile (loss ="mean_squared_error" , optimizer = "adam") 
history=model.fit (x_train, y_train, batch_size =1, nb_epoch =1400, shuffle = False, validation_split=0.15)


y_train_back=scaler_y.inverse_transform (np. array (y_train). reshape ((len( y_train), 1)))
plt.figure(1)
plt.plot (y_train_back)


fmt = '%.1f'
tick = mtick.FormatStrFormatter(fmt)
ax = plt.axes()
ax.yaxis.set_major_formatter(tick)
print (model.summary())
print(history.history.keys())

plt.figure(2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
fmt = '%.1f'
tick = mtick.FormatStrFormatter(fmt)
ax = plt.axes()
ax.yaxis.set_major_formatter(tick)

score_train = model.evaluate (x_train, y_train, batch_size =1)
score_test = model.evaluate (x_test, y_test, batch_size =1)
print (" in train MSE = ", round( score_train ,4)) 
print (" in test MSE = ", score_test )

pred1 = model.predict (x_test) 
pred1 = scaler_y.inverse_transform (np. array (pred1). reshape ((len( pred1), 1)))
 
prediction_data = pred1[-1]     
model.summary()
print ("Inputs: {}".format(model.input_shape))
print ("Outputs: {}".format(model.output_shape))
print ("Actual input: {}".format(x_test.shape))
print ("Actual output: {}".format(y_test.shape))

print ("prediction data:")
print (prediction_data)

y_test = scaler_y.inverse_transform (np. array (y_test). reshape ((len( y_test), 1)))
print ("y_test:")
print (y_test)

act_data = np.array([row[0] for row in y_test])

fmt = '%.1f'
tick = mtick.FormatStrFormatter(fmt)
ax = plt.axes()
ax.yaxis.set_major_formatter(tick)

plt.figure(3)
plt.plot( y_test, label="actual")
plt.plot(pred1, label="predictions")

print ("act_data:")
print (act_data)

print ("pred1:")
print (pred1)

plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
          fancybox=True, shadow=True, ncol=2)


fmt = '$%.1f'
tick = mtick.FormatStrFormatter(fmt)
ax = plt.axes()
ax.yaxis.set_major_formatter(tick)

def moving_test_window_preds(n_future_preds):

    ''' n_future_preds - Represents the number of future predictions we want to make
                         This coincides with the number of windows that we will move forward
                         on the test data
    '''
    preds_moving = []                                    # Store the prediction made on each test window
    moving_test_window = [x_test[0,:].tolist()]          # First test window
    moving_test_window = np.array(moving_test_window)    
   
    for i in range(n_future_preds):
      
      
        preds_one_step = model.predict(moving_test_window) 
        preds_moving.append(preds_one_step[0,0]) 
                       
        preds_one_step = preds_one_step.reshape(1,1,1) 
        moving_test_window = np.concatenate((moving_test_window[:,1:,:], preds_one_step), axis=1) # new moving test window, where the first element from the window has been removed and the prediction  has been appended to the end
        

    print ("pred moving before scaling:")
    print (preds_moving)
                                         
    preds_moving = scaler_y.inverse_transform((np.array(preds_moving)).reshape(-1, 1))
    
    print ("pred moving after scaling:")
    print (preds_moving)
    return preds_moving
    
print ("do moving test predictions for next 22 days:")    
preds_moving = moving_test_window_preds(22)


count_correct=0
error =0
for i in range (len(y_test)):
    error=error + ((y_test[i]-preds_moving[i])**2) / y_test[i]

 
    if y_test[i] >=0 and preds_moving[i] >=0 :
        count_correct=count_correct+1
    if y_test[i] < 0 and preds_moving[i] < 0 :
        count_correct=count_correct+1

accuracy_in_change =  count_correct / (len(y_test) )

plt.figure(4)
plt.title("Forecast vs Actual, (data is differenced)")          
plt.plot(preds_moving, label="predictions")
plt.plot(y_test, label="actual")
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
          fancybox=True, shadow=True, ncol=2)


print ("accuracy_in_change:")
print (accuracy_in_change)

ind=data_original.index.values[0] + data_original.shape[0] -len(y_test)-1
prev_starting_price = data_original.loc[ind,"yt_"]
preds_moving_before_diff =  [0 for x in range(len(preds_moving))]

for i in range (len(preds_moving)):
    if (i==0):
        preds_moving_before_diff[i]=prev_starting_price + preds_moving[i]
    else:
        preds_moving_before_diff[i]=preds_moving_before_diff[i-1]+preds_moving[i]


y_test_before_diff = [0 for x in range(len(y_test))]

for i in range (len(y_test)):
    if (i==0):
        y_test_before_diff[i]=prev_starting_price + y_test[i]
    else:
        y_test_before_diff[i]=y_test_before_diff[i-1]+y_test[i]


plt.figure(5)
plt.title("Forecast vs Actual (non differenced data)")
plt.plot(preds_moving_before_diff, label="predictions")
plt.plot(y_test_before_diff, label="actual")
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
          fancybox=True, shadow=True, ncol=2)
plt.show()

References
1. Time Series Prediction with LSTM and Keras for Multiple Steps Ahead
2. Machine Learning Stock Prediction with LSTM and Keras
3. Data File
4. Using LSTMs to forecast time-series

Time Series Prediction with LSTM and Keras for Multiple Steps Ahead

February 27, 2018April 17, 2018 by owygs156

In this post I will share experiment with Time Series Prediction with LSTM and Keras. LSTM neural network is used in this experiment for multiple steps ahead for stock prices data. The experiment is based on the paper [1]. The authors of the paper examine independent value prediction approach. With this approach a separate model is built for each prediction step. This approach helps to avoid error accumulation problem that we have when we use multi-stage step prediction.

LSTM Implementation

Following this approach I decided to use Long Short-Term Memory network or LSTM network for daily data stock price prediction. LSTM is a type of recurrent neural network used in deep learning. LSTMs have been used to advance the state-of the-art for many difficult problems. [2]

For this time series prediction I selected the number of steps to predict ahead = 3 and built 3 LSTM models with Keras in python. For each model I used different variable (fit0, fit1, fit2) to avoid any “memory leakage” between models.
The model initialization code is the same for all 3 models except changing parameters (number of neurons in LSTM layer)
The architecture of the system is shown on the fig below.

Multiple step prediction with separate neural networks

Here we have 3 LSTM models that are getting same X input data but different target Y data. The target data is shifted by number of steps. If model is forecasting the data stock price for day 2 then Y is shifted by 2 elements.
This happens in the following line when i=1:

yt_ = yt.shift (-i - 1  )

The data were obtained from stock prices from Internet.

The number of unit was obtained by running several variations and chosen based on MSE as following:

   
    if i==0:
        units=20
        batch_size=1
    if i==1:
        units=15
        batch_size=1
    if i==2:
         units=80
         batch_size=1

If you want run more than 3 steps / models you will need to add parameters to the above code. Additionally you will need add model initialization code shown below.

Each LSTM network was constructed as following:


 if i == 0 :
          fit0 = Sequential ()
          fit0.add (LSTM (  units , activation = 'tanh', inner_activation = 'hard_sigmoid' , input_shape =(len(cols), 1) ))
          fit0.add(Dropout(0.2))
          fit0.add (Dense (output_dim =1, activation = 'linear'))
          fit0.compile (loss ="mean_squared_error" , optimizer = "adam")  
   
          fit0.fit (x_train, y_train, batch_size =batch_size, nb_epoch =25, shuffle = False)
          train_mse[i] = fit0.evaluate (x_train, y_train, batch_size =batch_size)
          test_mse[i] = fit0.evaluate (x_test, y_test, batch_size =batch_size)
          pred = fit0.predict (x_test) 
          pred = scaler_y.inverse_transform (np. array (pred). reshape ((len( pred), 1)))
             # below is just fo i == 0
          for j in range (len(pred)) :
                   prediction_data[j] = pred[j]

For each model the code is saving last forecasted number.
Additionally at step i=0 predicted data is saved for comparison with actual data:

prediction_data = np.asarray(prediction_data)
prediction_data = prediction_data.ravel()

# shift back by one step
for j in range (len(prediction_data) - 1 ):
    prediction_data[len(prediction_data) - j - 1  ] =  prediction_data[len(prediction_data) - 1 - j - 1]

# combine prediction data from first model and last predicted data from each model
prediction_data = np.append(prediction_data, forecast)

The full python source code for time series prediction with LSTM in python is shown here

Data can be found here

Experiment Results

The LSTM neural network was running with the following performance:

train_mse
[0.01846262458218137, 0.009637593373373323, 0.0018845983509225203]
test_mse
[0.01648362025879952, 0.026161141224167357, 0.01774421124347165]

Below is the graph of actual data vs data testing data, including last 3 stock data prices from each model.

Multiple step prediction actual data vs predictions — Multiple step prediction – actual data vs predictions

Accuracy of prediction 98% calculated for last 3 data stock prices (one from each model).

The experiment confirmed that using models (one model for each step) in multistep-ahead time series prediction has advantages. With this method we can adjust parameters of needed LSTM for each step. For example, number of neurons for i=2 was modified to decrease prediction error for this step. And it did not affect predictions for other steps. This is one of machine learning techniques for stock prediction that is described in [1]

References
1. Multistep-ahead Time Series Prediction
2. LSTM: A Search Space Odyssey
3. Deep Time Series Forecasting with Python: An Intuitive Introduction to Deep Learning for Applied Time Series Modeling

Machine Learning Stock Prediction with LSTM and Keras

January 19, 2018March 2, 2018 by owygs156

In this post I will share experiments on machine learning stock prediction with LSTM and Keras with one step ahead. I tried to do first multiple steps ahead with few techniques described in the papers on the web. But I discovered that I need fully understand and test the simplest model – prediction for one step ahead. So I put here the example of prediction of stock price data time series for one next step.

Preparing Data for Neural Network Prediction

Our first task is feed the data into LSTM. Our data is stock price data time series that were downloaded from the web.
Our interest is closed price for the next day so target variable will be closed price but shifted to left (back) by one step.

Figure below is showing how do we prepare X, Y data.
Please note that I put X Y horizontally for convenience but still calling X, Y as columns.
Also on this figure only one variable (closed price) is showed for X but in the code it is actually more than one variable (it is closed price, open price, volume and high price).

Now we have the data that we can feed for LSTM neural network prediction. Before doing this we need also do the following things:

1. Decide how many data we want to use. For example if we have data for 10 years but want use only last 200 rows of data we would need specify start point because the most recent data will be in the end.

#how many data we will use 
# (should not be more than dataset length )
data_to_use= 100

# number of training data
# should be less than data_to_use
train_end =70


total_data=len(data_csv)

#most recent data is in the end 
#so need offset
start=total_data - data_to_use

2. Rescale (normalize) data as below. Here feature_range is tuple (min, max), default=(0, 1) is desired range of transformed data. [1]

scaler_x = preprocessing.MinMaxScaler ( feature_range =( -1, 1))
x = np. array (x).reshape ((len( x) ,len(cols)))
x = scaler_x.fit_transform (x)


scaler_y = preprocessing. MinMaxScaler ( feature_range =( -1, 1))
y = np.array (y).reshape ((len( y), 1))
y = scaler_y.fit_transform (y)

3. Divide data into training and testing set.

x_train = x [0: train_end,]
x_test = x[ train_end +1:len(x),]    
y_train = y [0: train_end] 
y_test = y[ train_end +1:len(y)]

Building and Running LSTM

Now we need to define our LSTM neural network layers , parameters and run neural network to see how it works.

fit1 = Sequential ()
fit1.add (LSTM (  1000 , activation = 'tanh', inner_activation = 'hard_sigmoid' , input_shape =(len(cols), 1) ))
fit1.add(Dropout(0.2))
fit1.add (Dense (output_dim =1, activation = 'linear'))

fit1.compile (loss ="mean_squared_error" , optimizer = "adam")   
fit1.fit (x_train, y_train, batch_size =16, nb_epoch =25, shuffle = False)

The first run was not good – prediction was way below actual.

However by changing the number of neurons in LSTM neural network prediction code, it was possible to improve MSE and get predictions much closer to actual data. Below are screenshots of plots of prediction and actual data for LSTM with 5, 60 and 1000 neurons. Note that the plot is showing only testing data. Training data is not shown on the plots. Also you can find below the data for MSE and number of neurons. As the number neorons in LSTM layer is changing there were 2 minimums (one around 100 neurons and another around 1000 neurons)

Results of prediction on LSTM with 5 neurons

Forecasting one step ahead LSTM 60 — Results of prediction on LSTM with 60 neurons

Forecasting one step ahead — Results of prediction on LSTM with 1000 neurons

LSTM 5
in train MSE = 0.0475
in test MSE = 0.2714831660102521

LSTM 60
in train MSE = 0.0127
in test MSE = 0.02227417068206705

LSTM 100
in train MSE = 0.0126
in test MSE = 0.018672733913263073

LSTM 200
in train MSE = 0.0133
in test MSE = 0.020082660237026824

LSTM 900
in train MSE = 0.0103
in test MSE = 0.015546535929778267

LSTM 1000
in train MSE = 0.0104
in test MSE = 0.015037054958075455

LSTM 1100
in train MSE = 0.0113
in test MSE = 0.016363980411369994

Conclusion

The final LSTM was running with testing MSE 0.015 and accuracy 97%. This was obtained by changing number of neurons in LSTM. This example will serve as baseline for further possible improvements on machine learning stock prediction with LSTM. If you have any tips or anything else to add, please leave a comment the comment box. The full source code for LSTM neural network prediction can be found here

References
1. sklearn.preprocessing.MinMaxScaler
2. Keras documentation – RNN Layers
3. How to Reshape Input Data for Long Short-Term Memory Networks in Keras
4. Keras FAQ: Frequently Asked Keras Questions
5. Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras
6.Deep Time Series Forecasting with Python: An Intuitive Introduction to Deep Learning for Applied Time Series Modeling
7. Machine Learning Stock Prediction with LSTM and Keras – Python Source Code

Prediction Data Stock Prices with Prophet

December 26, 2017December 29, 2017 by owygs156

In the previous post I showed how to use the Prophet for time series analysis with python. I used Prophet for data stock price prediction. But it was used only for one stock and only for next 10 days.

In this post we will select more data and will test how accurate can be prediction data stock prices with Prophet.

We will select 5 stocks and will do prediction stock prices based on their historical data. You will get chance to look at the report how error is distributed across different stocks or number of days in forecast. The summary report will show that we can easy get accuracy as high as 96.8% for stock price prediction with Prophet for 20 days forecast.

Data and Parameters

The five stocks that we will select are the stocks in the price range between $20 – $50. The daily historical data are taken from the web.

For time horizon we will use 20 days. That means that we will save last 20 prices for testing and will not use for forecasting.

Experiment

For this experiment we use python script with the main loop that is iterating for each stock. Inside of the loop, Prophet is doing forecast and then error is calculated and saved for each day in forecast:

   model = Prophet() #instantiate Prophet
   model.fit(data);
   future_stock_data = model.make_future_dataframe(periods=steps_ahead, freq = 'd')
   forecast_data = model.predict(future_stock_data)

   step_count=0
   # save actual data 
   for index, row in data_test.iterrows():
    
     results[ind][step_count][0] = row['y']
     results[ind][step_count][4] = row['ds']
     step_count=step_count + 1
   
   # save predicted data and calculate error
   count_index = 0   
   for index, row in forecast_data.iterrows():
     if count_index >= len(data)  :
       
        step_count= count_index - len(data)
        results[ind][step_count][1] = row['yhat']
        results[ind][step_count][2] = results[ind][step_count][0] -  results[ind][step_count][1]
        results[ind][step_count][3] = 100 * results[ind][step_count][2] / results[ind][step_count][0]
      
     count_index=count_index + 1

Later on (as shown in the above python source code) we count error as difference between actual closed price and predicted. Also error is calculated in % for each day for each stock using the formula:

Error(%) = 100*(Actual-Predicted)/Actual

Results
The detailed result report can be found here. In this report you can see how error is distributed across different days. You can note that there is no significant increase in error with 20 days of forecast and the error always has the same sign for all 20 days.

Below is the summary of error and accuracy for our selected 5 stocks. Also added the column the year of starting point of data range that was used for forecast. It turn out that all 5 stocks have different historical data range. The shortest data range was starting in the middle of 2016.

prediction data stock prices with Prophet summary results — prediction data stock prices with Prophet summary of results

Overall results for accuracy are not great. Only one stock got good accuracy 96.8%.
Accuracy was varying for different stocks. To investigate variation I plot graph of accuracy and beginning year of historical data. The plot is shown below. Looks like there is a correlation between data range used for forecast and accuracy. This makes sense – as the data in the further past may be do more harm than good.

Looking at the plot below we see that the shortest historical range (just about for one year) showed the best accuracy.

prediction data stock prices with Prophet - accuracy vs used data range — prediction data stock prices with Prophet – accuracy vs used data range

Conclusion
We did not get good results (except one stock) in our experiments but we got the knowledge about possible range and distribution of error over the different stocks and time horizon (up to 20 days). Also it looks promising to try do forecast with different historical data range to check how it will affect performance. It would be interesting to see if the best accuracy that we got for one stock can be achieved for other stocks too.

I hope you enjoyed this post about using Prophet for prediction data stock prices. If you have any tips or anything else to add, please leave a comment in the reply box below.

Here is the script for stock data forecasting with python using Prophet.

import pandas as pd
from fbprophet import Prophet

steps_ahead = 20
fname_path="C:\\Users\\stock data folder"

fnames=['fn1.csv','fn2.csv', 'fn3.csv', 'fn4.csv', 'fn5.csv']
# output fields: actual, predicted, error, error in %, date
fields_number = 6

results= [[[0 for i in range(len(fnames))] for j in range(steps_ahead)] for k in range(fields_number)]

for ind in range(5):
 
   fname=fname_path + "\\" + fnames[ind]
  
   data = pd.read_csv (fname)

   #keep only date and close
   #delete Open, High,	Low  , Adj CLose, Volume
   data.drop(data.columns[[1, 2, 3,5,6]], axis=1)
   data.columns = ['ds', 'y', "", "", "", "", ""]
  
   data_test = data[-steps_ahead:]
   print (data_test)
   data = data[:-steps_ahead]
   print (data)
     
   model = Prophet() #instantiate Prophet
   model.fit(data);
   future_stock_data = model.make_future_dataframe(periods=steps_ahead, freq = 'd')
   forecast_data = model.predict(future_stock_data)
   print (forecast_data[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(12))

   step_count=0
   for index, row in data_test.iterrows():
    
     results[ind][step_count][0] = row['y']
     results[ind][step_count][4] = row['ds']
     step_count=step_count + 1
   
  
   count_index = 0   
   for index, row in forecast_data.iterrows():
     if count_index >= len(data)  :
       
        step_count= count_index - len(data)
        results[ind][step_count][1] = row['yhat']
        results[ind][step_count][2] = results[ind][step_count][0] -  results[ind][step_count][1]
        results[ind][step_count][3] = 100 * results[ind][step_count][2] / results[ind][step_count][0]
      
     count_index=count_index + 1

for z in range (5):        
  for i in range (steps_ahead):
    temp=""
    for j in range (5):
          temp=temp + " " + str(results[z][i][j])
    print (temp)
   
   
  print (z)

Time Series Analysis with Python and Prophet

December 17, 2017December 24, 2017 by owygs156

Recently Facebook released Prophet – open source software tool for forecasting time series data.
Facebook team have implemented in Prophet two trend models that can cover many applications: a saturating growth model, and a piecewise linear model. [4]

With growth model Prophet can be used for prediction growth/decay – for example for modeling growth of population or website traffic. By default, Prophet uses a linear model for its forecast.

In this post we review main features of Prophet and will test it on prediction of stock data prices for next 10 business days. We will use python for time series programming with Prophet.

How to Use Prophet for Time Series Analysis with Python

Here is the minimal python source code to create forecast with Prophet.

import pandas as pd
from fbprophet import Prophet

The input columns should have headers as ‘ds’ and ‘y’. In the code below we run forecast for 180 days ahead for stock data, ds is our date stamp data column, and y is actual time series column

fname="C:\\Users\\data analysis\\gm.csv"
data = pd.read_csv (fname)

data.columns = ['ds', 'y']
model = Prophet() 
model.fit(data); #fit the model
future_stock_data = model.make_future_dataframe(periods=180, freq = 'd')
forecast_data = model.predict(future_stock_data)
print ("Forecast data")
model.plot(forecast_data)

The result of the above code will be graph shown below.

time series analysis python using Prophet

Time series model consists of three main model components: trend, seasonality, and holidays.
We can print components using the following line

model.plot_components(forecast_data)

time series analysis python - components — time series analysis python – components

Can Prophet be Used for Data Stock Price Prediction?

It is interesting to know if Prophet can be use for financial market price forecasting. The practical question is how accurate forecasting with Prophet for stock data prices? Data analytics for stock market is known as hard problem as many different events influence stock prices every day. To check how it works I made forecast for 10 days ahead using actual stock data and then compared Prophet forecast with actual data.

The error was calculated for each day for data stock price (closed price) and is shown below

Based on obtained results the average error is 7.9% ( this is 92% accuracy). In other words the accuracy is sort of low for stock prices prediction. But note that all errors have the same sign. May be Prophet can be used better for prediction stock market direction? This was already mentioned in another blog [2].
And also looks like the error is not changing much with the the time horizon, at least within first 10 steps. Forecasts for days 3,4,5 even have smaller error than for the day 1 and 2.

Further looking at more data would be interesting and will follow soon. Bellow is the full python source code.


import pandas as pd
from fbprophet import Prophet


fname="C:\\Users\\GM.csv"
data = pd.read_csv (fname)
data.columns = ['ds', 'y']
model = Prophet() #instantiate Prophet
model.fit(data); #fit the model with your dataframe

future_stock_data = model.make_future_dataframe(periods=10, freq = 'd')
forecast_data = model.predict(future_stock_data)

print ("Forecast data")
print (forecast_data[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(12))
	
model.plot(forecast_data)
model.plot_components(forecast_data)

References
1. Stock market forecasting with prophet
2. Can we predict stock prices with Prophet?
3. Saturating Forecasts
4. Forecasting at Scale

Data inputting

Predicting future changes

LSTM Configuration

Results

Share this:

LSTM Implementation

Experiment Results

Share this:

Preparing Data for Neural Network Prediction

Building and Running LSTM

Conclusion

Share this:

Data and Parameters

Experiment

Share this:

How to Use Prophet for Time Series Analysis with Python

Can Prophet be Used for Data Stock Price Prediction?

Share this: