Stock Price Prediction by using Machine Learning in Python
Stock Price Prediction for Elon musk theme by using Machine Learning in Python.The dataset we will use here to perform the analysis and build a predictive model on Twitter, Tesla, Paypal Stock Price data. We will use OHLC(‘Open’, ‘High’, ‘Low’, ‘Close’) data from 1st January 2014 to 27th Oct 2022 which is for 8 years for these stocks.
Dataset explanation: For all Dataset there is the same 7 Column but the row of each stock is different but some Date information data are not the same. For both TWTR and TSLA there are 2222 rows but for PYPL there are 1843 rows for using a predictive model.So we plot the time series chart for Exploratory Data Analysis (EDA).
Twitter (TWTR)
The prices of Twitter stocks are showing an instability as depicted by the plot of the closing price of the stocks.
Tesla(TSLA)
Paypal(PYPL)
TWTR + TSLA +PYPL
If we observe carefully we can see that the data in the ‘Close’ column and that available in the ‘Adj Close’ column is the same. Let’s check whether this is the case with each row or not.From here we can conclude that all the rows of columns ‘Close’ and ‘Adj Close’ have the same data. So, having redundant data in the dataset is not going to help so, we’ll drop this column before further analysis by the way Close and Adj close the definition are different but in this case , we decided to drop it out.
The TWRT distribution of OHLC data,
The TSLA distribution of OHLC data
The PYPL distribution of OHLC data
Outliers checking
TWRT
TSLA
PYPL
TWRT
Here are some of the important observations of the above-grouped data: Prices are lower in the months which are quarter end as compared to that of the non-quarter end months. The volume of trades is lower in the months which are quarter end
Above we have added some more columns which will help in the training of our model. We have added the target feature which is a signal whether to buy or not we will train our model to predict this only. But before proceeding let’s check whether the target is balanced or not using a pie chart. If price close(t-1) more than price close(t) that will present 1, if not the will show 0
Correlation checking
Data Splitting and Normalization
After selecting the features to train the model on we should normalize the data because normalized data leads to stable and fast training of the model. After that whole data has been split into two parts with a 90/10 ratio so that we can evaluate the performance of our model on unseen data. Ratio is (1999, 3) : (223, 3)
Model Development and Evaluation
Now is the time to train some state-of-the-art machine learning models(Logistic Regression, Support Vector Machine, XGBClassifier), and then based on their performance on the training and validation data we will choose which ML model is serving the purpose at hand better. For the evaluation metric, we will use the ROC-AUC curve but why this is because instead of predicting the hard probability that is 0 or 1 we would like it to predict soft probabilities that are continuous values between 0 to 1. And with soft probabilities, the ROC-AUC curve is generally used to measure the accuracy of the predictions.
Note: we run the same method with these stock and the result as below
TSLA
Making pie charts. If price close(t-1) more than price close(t) that will present 1, if not the will show 0
PAYPAL
From all the pie chart,It shows portions 0 about 50 percent and shows portions1 about 50 percent the same. It indicates that all stocks can not predict the price of each stock because there may be less information to predict the stock price.Refer 3 models, the XGBClassifier model we can conclude that best accuracy rate belongs to Twitter and Tesla stock price since testing data accuracy rate are higher than PAYPAL, to easily explain, testing data accuracy determine how well your data learn from training dataset to adapt with unforeseen data evaluation.