Case Study: Netflix Stock Prices Prediction¶

Girl in a jacket

1 Case Study: Netflix Stock Prices Prediction
2 Introduction
3 Libraries and Data Importation
4 Data Exploration
5 Data Pre-processing
6 Model Creation
7 Model Evaluation
8 Conclusion
9 Case Study: Facebook Stock Price Prediction
10 Data Importation
11 Data Pre-processing
12 Model Creation
13 Model Visualization
14 Model Evaluation
15 Conclusion

Introduction¶

Algorithmic trading is becoming popular over the past few years and also by using robot to do the work for them. It uses a method of executing orders using automated pre-programmed trading instructions. This project will attempt to predict the stock prices of Netflix and Facebook by using machine learning algorithms.

Problem

Part 1: Predict the price of netflix stock in the next thirty days.
Part 2: Predict the price of Facebook stock in a specific day.

Dataset

Historical data of Netflix
Historical data of Facebook

Source: Yahoo Finance

Libraries and Data Importation¶

# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.style.use('bmh')

# Load the data
df = pd.read_csv('project_data/NFLX.csv')
df.head()

Data Exploration¶

# Get the traing days
df.shape

(252, 7)

# Visualize the close price data
plt.figure(figsize=(16,8))
plt.title('NETFLIX')
plt.xlabel('Days')
plt.ylabel('Close Price (USD)')
plt.plot(df['Close'])
plt.show()

# Get the close price
df = df[['Close']]
df.head()

Data Pre-processing¶

# Create a variable to predict 'x' days out in the future
future_days = 30

# Create a new column (target) shifted 'x' units/days up
df['Prediction'] = df[['Close']].shift(-future_days)
df.tail()

# Create the feature dataset(X) and convert it to a numpy array and remove the 'x' rows/days
X = np.array(df.drop(['Prediction'], axis = 1))[:-future_days]
X;

# Create a target data set (Y) and convert it to a numpy array 
# and get all of the target values except the last 'X' rows/days
y = np.array(df['Prediction'])[:-future_days]
y;

# Split the data into 75% training and 25% testing
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

# Get the last 'x' rows of the feature dataset 
x_future = df.drop(['Prediction'], axis = 1)[:-future_days]
x_future = x_future.tail(future_days)
x_future = np.array(x_future)
x_future;

Model Creation¶

# Fitting linear Regression to the dataset
from sklearn.linear_model import LinearRegression
lr = LinearRegression().fit(x_train, y_train)

# Fitting SVR to the dataset
from sklearn.svm import SVR
svrl = SVR(kernel = 'linear')
svrl.fit(x_train, y_train)

# Fitting SVR to the dataset
from sklearn.svm import SVR
svrp = SVR(kernel = 'poly')
svrp.fit(x_train, y_train)

# Fitting Decision Tree to the dataset
from sklearn.tree import DecisionTreeRegressor
tree = DecisionTreeRegressor()
tree = tree.fit(x_train, y_train)

# Fitting Random Forest Regression to the dataset
from sklearn.ensemble import RandomForestRegressor
forest = RandomForestRegressor(n_estimators = 20, criterion = 'mse', random_state = 0)
forest.fit(x_train, y_train)

# Fitting XGBoost Regression to the dataset
from xgboost import XGBRegressor
xgb = XGBRegressor()
xgb.fit(x_train, y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)

# Show the model linear regression prediction
lr_prediction = lr.predict(x_future)
print('lr_prediction: ',lr_prediction)

# Show the model SVR linear regression prediction
svrl_prediction = svrl.predict(x_future)
print('svr1_prediction: ', svrl_prediction)

# Show the model SVR Poly prediction
svrp_prediction = svrp.predict(x_future)
print('svrp_prediction: ', svrp_prediction)

# Show the model tree prediction
tree_prediction = tree.predict(x_future)
print('tree_prediction: ', tree_prediction)

# Show the model Random Forest prediction
forest_prediction = forest.predict(x_future)
print('forest_prediction: ', forest_prediction)

# Show the XGBoost prediction
xgb_prediction = xgb.predict(x_future)
print('xgb_prediction: ', xgb_prediction)

print()

lr_prediction:  [383.60905272 407.45931151 422.89775557 435.0063595  446.39945659
 431.52971737 444.85834185 441.50095651 430.11706837 434.96050463
 433.39187127 430.08036833 413.98142713 421.37503304 428.67687325
 424.47554284 436.2905987  433.10750555 441.8954242  443.97772516
 443.07874378 447.6378134  439.65716639 445.57384869 448.94959873
 460.17755754 458.70066237 457.28801338 454.19665658 443.72087714]
svr1_prediction:  [385.53937099 408.52008749 423.39567055 435.06281415 446.04053713
 431.71292483 444.55560897 441.32062854 430.35177876 435.01863107
 433.50718758 430.31641674 414.8044172  421.92846406 428.96409085
 424.91593422 436.30023204 433.23318938 441.70071456 443.70709816
 442.8408922  447.23374543 439.54406005 445.24502958 448.49770517
 459.31631055 457.89326062 456.53211455 453.55346368 443.45961441]
svrp_prediction:  [376.34805205 403.61999459 423.29403109 439.88861991 456.46902433
 435.01701225 454.17041122 449.22358024 433.062319   439.82379981
 437.61552503 433.01172679 411.73230965 421.2804906  431.08416249
 425.39750669 441.71018596 437.21710075 449.80049142 452.86484832
 451.53796875 458.32885512 446.54215983 455.2354173  460.31143657
 477.81181912 475.45507524 473.21648639 468.37076629 452.48513678]
tree_prediction:  [429.320007 414.769989 419.890015 413.440002 419.730011 425.920013
 427.309998 421.970001 414.329987 419.600006 419.48999  434.049988
 414.769989 425.559998 418.070007 425.5      436.130005 447.769989
 449.869995 453.720001 468.040009 466.26001  421.970001 465.910004
 443.399994 447.23999  455.040009 485.640015 476.890015 453.720001]
forest_prediction:  [430.070007   404.11049045 421.87150915 418.68050205 440.9425081
 424.5000029  431.1699986  432.5420012  420.02648955 420.52850325
 424.5639906  430.4799898  416.90049295 423.8560032  421.0790022
 424.66150215 427.14050285 436.19998925 442.3069991  448.4380004
 459.6235048  455.0875063  430.7830016  446.610001   445.98399825
 451.109996   459.6000064  470.3100085  476.6035129  450.47450095]
xgb_prediction:  [429.92593 413.70047 422.9177  416.59482 422.2149  425.72382 430.69208
 424.7191  418.3787  419.94144 419.94144 430.15622 413.70047 423.7935
 420.79367 423.28537 433.30252 442.20663 449.66965 453.82437 465.28488
 464.38287 424.7191  463.25876 444.77448 447.58707 454.96893 483.44696
 476.75873 453.82437]

Model Evaluation¶

Linear Regression Prediction¶

# Visualize the data
Predictions = lr_prediction

valid = df[X.shape[0]:]
valid['Prediction'] = Predictions
plt.figure(figsize = (16,8))
plt.title('Model')
plt.xlabel('Days')
plt.ylabel('Close Price USD')
plt.plot(df['Close'])
plt.plot(valid[['Close', 'Prediction']])
plt.legend(['Orig', 'Val', 'Pred'])
plt.show()

<ipython-input-44-ec9eb76eee8f>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Prediction'] = Predictions

SVR Linear Prediction¶

# Visualize the data
Predictions = svrl_prediction

valid = df[X.shape[0]:]
valid['Prediction'] = Predictions
plt.figure(figsize = (16,8))
plt.title('Model')
plt.xlabel('Days')
plt.ylabel('Close Price USD')
plt.plot(df['Close'])
plt.plot(valid[['Close', 'Prediction']])
plt.legend(['Orig', 'Val', 'Pred'])
plt.show()

<ipython-input-45-c0869a4836a4>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Prediction'] = Predictions

SVR Poly Prediction¶

# Visualize the data
Predictions = svrp_prediction

valid = df[X.shape[0]:]
valid['Prediction'] = Predictions
plt.figure(figsize = (16,8))
plt.title('Model')
plt.xlabel('Days')
plt.ylabel('Close Price USD')
plt.plot(df['Close'])
plt.plot(valid[['Close', 'Prediction']])
plt.legend(['Orig', 'Val', 'Pred'])
plt.show()

<ipython-input-46-d51d77364124>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Prediction'] = Predictions

Decision Tree Prediction¶

# Visualize the data
Predictions = tree_prediction

valid = df[X.shape[0]:]
valid['Prediction'] = Predictions
plt.figure(figsize = (16,8))
plt.title('Model')
plt.xlabel('Days')
plt.ylabel('Close Price USD')
plt.plot(df['Close'])
plt.plot(valid[['Close', 'Prediction']])
plt.legend(['Orig', 'Val', 'Pred'])
plt.show()

<ipython-input-47-7901a16d8aac>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Prediction'] = Predictions

Random Forest Prediction¶

# Visualize the data
Predictions = forest_prediction

valid = df[X.shape[0]:]
valid['Prediction'] = Predictions
plt.figure(figsize = (16,8))
plt.title('Model')
plt.xlabel('Days')
plt.ylabel('Close Price USD')
plt.plot(df['Close'])
plt.plot(valid[['Close', 'Prediction']])
plt.legend(['Orig', 'Val', 'Pred'])
plt.show()

<ipython-input-48-17458f1a1ad6>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Prediction'] = Predictions

XGB Prediction¶

# Visualize the data
Predictions = xgb_prediction

valid = df[X.shape[0]:]
valid['Prediction'] = Predictions
plt.figure(figsize = (16,8))
plt.title('Model')
plt.xlabel('Days')
plt.ylabel('Close Price USD')
plt.plot(df['Close'])
plt.plot(valid[['Close', 'Prediction']])
plt.legend(['Orig', 'Val', 'Pred'])
plt.show()

<ipython-input-49-bdc2206736eb>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Prediction'] = Predictions

Conclusion¶

It can be observe that Linear Regression and Support Vector Machines prediction are very far from the actual price. Meanwhile, the Decision Tree, Random Forest and XGBOOST model somewhat failed to predict starting days and last remaining days but in the middle days it seems to almost fit the prediction from the actual prices. Nevertheless, this can be further improve by tuning the right hyper parameter and trying different models such as Reinforcement Learning and Artificial Neural Network.

Case Study: Facebook Stock Price Prediction¶

Data Importation¶

# Load the data
df = pd.read_csv('project_data/FB.csv')
df

# Get the number of rows and columns
df.shape

(21, 7)

Data Pre-processing¶

# Get and print the last row of data
actual_price = df.tail(1)
actual_price

# Prepare the data for training the models
# get all the data except for the last row
df = df.head(len(df)-1)

#print the new dataset
df

# Create empty list to store the independent and dependent data
days = list()
adj_close_prices = list()

# Get the date and adjusted close price
df_days = df.loc[:, 'Date']
df_adj_close = df.loc[:, 'Adj Close']

# Create the indepedent dataset
for day in df_days:
    days.append( [int(day.split('-')[2])] )
    
# Create the dependent dataset
for adj_close_price in df_adj_close:
    adj_close_prices.append( float(adj_close_price) )

#print the days and the adj close prices
print(days)
print(adj_close_prices)

[[1], [2], [3], [4], [5], [8], [9], [10], [11], [12], [15], [16], [17], [18], [19], [22], [23], [24], [25], [26]]
[231.91000400000001, 232.72000099999997, 230.16000400000001, 226.289993, 230.770004, 231.399994, 238.669998, 236.729996, 224.42999300000002, 228.580002, 232.5, 235.649994, 235.52999900000003, 235.940002, 238.789993, 239.22000099999997, 242.24000499999997, 234.020004, 235.67999300000002, 216.080002]

Model Creation¶

# Create models
from sklearn.svm import SVR

# Create and train  a SVR model using a linear kernel
lin_svr = SVR(kernel = 'linear', C= 1000.0)
lin_svr.fit(days, adj_close_prices)

# Create and train  a SVR model using a linear kernel
poly_svr = SVR(kernel = 'poly', degree = 2, C= 1000.0)
poly_svr.fit(days, adj_close_prices)

# Create and train  a SVR model using a linear kernel
rbf_svr = SVR(kernel = 'rbf', gamma = 0.15, C= 1000.0)
rbf_svr.fit(days, adj_close_prices)

# Fitting Decision Tree to the dataset
from sklearn.tree import DecisionTreeRegressor
tree = DecisionTreeRegressor()
tree = tree.fit(days, adj_close_prices)

# Fitting Random Forest Regression to the dataset
from sklearn.ensemble import RandomForestRegressor
forest = RandomForestRegressor(n_estimators = 20, criterion = 'mse', random_state = 0)
forest.fit(days, adj_close_prices)

RandomForestRegressor(n_estimators=20, random_state=0)

Model Visualization¶

# Plot the model on the graph to see which has the best fit on the original data
plt.figure(figsize=(16,8))
plt.scatter(days, adj_close_prices, color = 'red', label = 'data')
plt.plot(days, rbf_svr.predict(days), color = 'green', label = 'RBF Model')
plt.plot(days, poly_svr.predict(days), color = 'orange', label = 'Polynomial Model')
plt.plot(days, lin_svr.predict(days), color = 'blue', label = 'Linear  Model')
plt.legend()
plt.show()

# Plot the model on the graph to see which has the best fit on the original data
plt.figure(figsize=(16,8))
plt.scatter(days, adj_close_prices, color = 'red', label = 'data')
plt.plot(days, tree.predict(days), color = 'green', label = 'Decision Tree Model')
plt.plot(days, forest.predict(days), color = 'orange', label = 'Random Forest Model')
plt.legend()
plt.show()

Model Evaluation¶

# Show the predicted price for the given day
day = [[30]]

print('The Linear SVR predicted:', lin_svr.predict(day))
print('The Polynomial SVR predicted:', poly_svr.predict(day))
print('The RBF SVR predicted:', rbf_svr.predict(day))
print('The Decision Tree predicted:', tree.predict(day))
print('The Random Forest predicted:', forest.predict(day))

The Linear SVR predicted: [240.23070864]
The Polynomial SVR predicted: [237.51221858]
The RBF SVR predicted: [213.69266326]
The Decision Tree predicted: [216.080002]
The Random Forest predicted: [224.89600075]

# print the actual price of the stock on day 30
print('The actual price:', actual_price['Adj Close'][20])

The actual price: 220.63999900000002

Conclusion¶

The Decision Tree and Random Forest model has the closer value in predicting the actual price. Machine learning can be used as a range of guidance in how much will the stock will go up or down but can not guarantee for a high accuracy and precision. There can be a lot of factors to be consider when it comes to predicting the stock market prices.

	Date	Open	High	Low	Close	Adj Close	Volume
0	2019-07-08	378.190002	378.250000	375.359985	376.160004	376.160004	3113400
1	2019-07-09	379.059998	384.760010	377.500000	379.929993	379.929993	6932800
2	2019-07-10	382.769989	384.339996	362.679993	381.000000	381.000000	5878800
3	2019-07-11	381.100006	384.540009	378.799988	379.500000	379.500000	4336300
4	2019-07-12	378.679993	379.739990	372.790009	373.250000	373.250000	6636900

	Close
0	376.160004
1	379.929993
2	381.000000
3	379.500000
4	373.250000

	Date	Open	High	Low	Close	Adj Close	Volume
0	2020-06-01	224.589996	232.440002	223.500000	231.910004	231.910004	18223800
1	2020-06-02	230.940002	233.000000	226.559998	232.720001	232.720001	20919000
2	2020-06-03	232.110001	232.649994	228.529999	230.160004	230.160004	15380300
3	2020-06-04	229.559998	231.630005	224.610001	226.289993	226.289993	17041500
4	2020-06-05	226.710007	231.350006	225.309998	230.770004	230.770004	16750400
5	2020-06-08	229.029999	231.550003	227.410004	231.399994	231.399994	15466500
6	2020-06-09	231.520004	239.770004	230.410004	238.669998	238.669998	27462900
7	2020-06-10	240.960007	241.210007	235.279999	236.729996	236.729996	20720700
8	2020-06-11	229.940002	232.889999	223.550003	224.429993	224.429993	26708200
9	2020-06-12	229.899994	231.660004	224.500000	228.580002	228.580002	22071700
10	2020-06-15	225.089996	233.770004	224.800003	232.500000	232.500000	15340300
11	2020-06-16	237.139999	238.460007	233.000000	235.649994	235.649994	15236700
12	2020-06-17	235.000000	237.589996	231.729996	235.529999	235.529999	19552800
13	2020-06-18	234.990005	236.139999	232.149994	235.940002	235.940002	15782500
14	2020-06-19	237.789993	240.830002	235.550003	238.789993	238.789993	30081300
15	2020-06-22	238.559998	240.699997	236.910004	239.220001	239.220001	18917800
16	2020-06-23	241.279999	245.190002	239.860001	242.240005	242.240005	24017900
17	2020-06-24	241.199997	243.220001	232.679993	234.020004	234.020004	20834900
18	2020-06-25	234.619995	237.300003	232.740005	235.679993	235.679993	18704300
19	2020-06-26	232.639999	233.089996	215.399994	216.080002	216.080002	76343900
20	2020-06-29	209.750000	220.750000	207.110001	220.639999	220.639999	58514300

	Date	Open	High	Low	Close	Adj Close	Volume
0	2020-06-01	224.589996	232.440002	223.500000	231.910004	231.910004	18223800
1	2020-06-02	230.940002	233.000000	226.559998	232.720001	232.720001	20919000
2	2020-06-03	232.110001	232.649994	228.529999	230.160004	230.160004	15380300
3	2020-06-04	229.559998	231.630005	224.610001	226.289993	226.289993	17041500
4	2020-06-05	226.710007	231.350006	225.309998	230.770004	230.770004	16750400
5	2020-06-08	229.029999	231.550003	227.410004	231.399994	231.399994	15466500
6	2020-06-09	231.520004	239.770004	230.410004	238.669998	238.669998	27462900
7	2020-06-10	240.960007	241.210007	235.279999	236.729996	236.729996	20720700
8	2020-06-11	229.940002	232.889999	223.550003	224.429993	224.429993	26708200
9	2020-06-12	229.899994	231.660004	224.500000	228.580002	228.580002	22071700
10	2020-06-15	225.089996	233.770004	224.800003	232.500000	232.500000	15340300
11	2020-06-16	237.139999	238.460007	233.000000	235.649994	235.649994	15236700
12	2020-06-17	235.000000	237.589996	231.729996	235.529999	235.529999	19552800
13	2020-06-18	234.990005	236.139999	232.149994	235.940002	235.940002	15782500
14	2020-06-19	237.789993	240.830002	235.550003	238.789993	238.789993	30081300
15	2020-06-22	238.559998	240.699997	236.910004	239.220001	239.220001	18917800
16	2020-06-23	241.279999	245.190002	239.860001	242.240005	242.240005	24017900
17	2020-06-24	241.199997	243.220001	232.679993	234.020004	234.020004	20834900
18	2020-06-25	234.619995	237.300003	232.740005	235.679993	235.679993	18704300
19	2020-06-26	232.639999	233.089996	215.399994	216.080002	216.080002	76343900

	Close	Prediction
247	447.239990	NaN
248	455.040009	NaN
249	485.640015	NaN
250	476.890015	NaN
251	493.809998	NaN

Case Study: Netflix Stock Prices Prediction¶

Table of Contents

Introduction¶

Libraries and Data Importation¶

Data Exploration¶

Data Pre-processing¶

Model Creation¶

Model Evaluation¶

Linear Regression Prediction¶

SVR Linear Prediction¶

SVR Poly Prediction¶

Decision Tree Prediction¶

Random Forest Prediction¶

XGB Prediction¶

Conclusion¶

Case Study: Facebook Stock Price Prediction¶

Data Importation¶

Data Pre-processing¶

Model Creation¶

Model Visualization¶

Model Evaluation¶

Conclusion¶