Essay:

Essay details:

  • Subject area(s): Engineering
  • Price: Free download
  • Published on: 7th September 2019
  • File format: Text
  • Number of pages: 2

Text preview of this essay:

This page is a preview - download the full version of this essay above.

Prediction Analysis for Big Data in Live Stock News  

Vaishali A.Ingle 1, Sachin N.Deshmukh2

1Research Scholar, Department Of CS and IT, Dr.B.A.M.University, Aurangabad,Maharashtra,India

2Professor, Department Of CS and IT, Dr.B.A.M.University,

Aurangabad,Maharashtra,India

ABSTRACT

Breaking news generates large amount of big data both in structured and unstructured format. The various machine learning algorithms are used for prediction of stock market movement. The data collected for stock market is in the form of breaking news from various finance web sites. The tf-idf features extracted from online news data are used for creation of hmm model along with log likelihood values. The next day\'s stock price is predicted as either higher or lower than current day\'s stock price. Results obtained from proposed model is compared with results from other machine learning predictive techniques such as random forest , knn, multiple regression, bagging  and boosting. The proposed model produces approximately 70 % of accurate prediction.  The captured features are graphically represented with word cloud .the results can be further improved with use of deep learning ensemble methods.

Keywords: Text Mining, Stock Market, HMM, Bagging, Boosting, Multiple Regression, Random Forest, Finance News, TF-IDF, Word cloud

INTRODUCTION

The central part of a text mining deals with [1] study of patterns in document collection. The text distributions, frequency generally produce very large numbers of trends. Websites, articles, abstracts, books, News feeds , letters, blogs, forums, mailing lists, Twitter etc. produce readable text data in unstructured format. Big data is a term that describes the large volume of data that inundates a business on a daily basis. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analysts articulated the now-mainstream definition of big data as the three Vs:

 The finance news [2] provides vital information to buy and sell a particular company stock. The dynamic change in news about a company influences the decision of investors and traders. In this paper, real time news from news sources like Yahoo Finance, Google News available in the web domain is collected to predict the stock market trends. During Finance budget declaration, Stock exchange curves always shows extreme high or low peak values. The unpredicted events such as terror attacks, natural disasters also have impact on stock movement. The 100 % accuracy for prediction of next day\'s stock price rate is very rare.   

In this paper Section 2 gives overview of various machine learning techniques used for predictions. Section 3 describes the data preparation, pre-processing and aggregation.  Section 4 elaborates experimental results .Section 5 deals with experimental conclusions with a concise on prospective direction for the research.

RELATED WORK

In K- Nearest Neighbor (KNN) [3] the data is in a feature space. KNN can be used for regression. It is observed that the neighboring vectors contribute more to weight than the distant vectors. It can predict the outcome of a dependent [4] variable given a set of independent variables.

Random forests are algorithms [5] used for prediction together with a set of decision trees that grow in arbitrarily. Random Forest ranks the important variables also known as predictors [6] are strongly associated to result development.

Bagging takes random samples of data use learning algorithms and simple means to obtain bagging [7] probabilities. Boosting selects the sample dataset such that it is hard to classify .Both these methods adjust the training instances. Boosting, discovers [8] many rough probabilities than only one most perfect.

The most common goals of Multiple Regression [9] are to either to describe a model or predict the response or confirm the variables or their combinations are important for model generation. Regression determines whether the included variables capture changes in response variable.

It was found that the Multiple Regression algorithm have performed more efficiently than for prediction of stock values than Random Forest, KNN, Bagging and Boosting algorithms.

DATA COLLECTION, PREPROCESSING AND AGGREGATION

The  online news streams is collected from some of web sources  listed in Table 1 .The basic text mining transformations  are  performed on text data , the correlation is established between terms . In this work online news from sources such as Yahoo Finance, Google News and Google Finance etc. are collected. All the news are aggregated [10] in a text corpus. Following table shows list of currently implemented websources.

Table1: Overview of web sources listing the maximum number of items per feed

Source Name Items URL Auth Format

GoogleBlogSearchSource 100 http://www.google.com/blogsearch - RSS

GoogleFinanceSource 20 http://www.google.com/finance - RSS

GoogleNewsSource 100 http://news.google.com - RSS

NYTimesSource 100 http://api.nytimes.com x JSON

ReuterNewsSource 20 http://www.reuters.com/tools/rss - ATOM

YahooFinanceSource 20 http://finance.yahoo.com - RSS

YahooInplaySource 100+ http://finance.yahoo.com/marketupdate/inplay - HTML

YahooNewsSource 20 http://news.search.yahoo.com/rss - RSS

Some pre-processing of the text data is required prior to text analysis. Example transformations [11] include converting the text to lower case, removing numbers and punctuation, removing stop words, stemming and identifying synonyms.

EXPERIMENTAL WORK

TF-IDF weights [12] computation is used for counting of word score. It can be divided in two separate terms as TF and IDF. Figure 3 shows the proposed data flow of model.  

Table 2 and Table 3 lists the selected ten companies TF-IDF and log likelihood Convergence values respectively used for calculation of predicted stock price values. This extracted data along with opening and closing price of company stocks on that particular day is used as an input for generation of HMM model.

The total no. of documents collected on consecutive five days is 150,105,125,140 and 135 respectively.

Figure 3.  Proposed Data Flow of Model Generation  

In HMM model the probability [14] of an observed sequence is calculated  and most likely series of states probabilities is predicted . Though the parameters of model are clearly stated their states are unknown. The transition matrix, [15] contains the probabilities of switching from one state to another. The emission matrix contains the probabilities of most likely states.

Table 2. Log likelihood Convergence values

Sr.No. Company Name Log Liklihood Value

1. BAJAJ 15.33456

2. TCS 23.48464

3. HDFC 22.03568

4. GAIL 17.25169

5. MARUTI 30.87912

6. SUN 16.84491

7. ITC 18.14116

8. MAHINDRA 16.03129

9. LUPIN 16.68404

10. AIRTEL 15.83614

The log likelihood Convergence values for 5 days data for companies extracted TF-IDF features is shown in Table 2.The values from this data, subset of the data which is nonlinear is selected. Given data is converted in proper form for plotting.

Table 3. TF- IDF [15]values for Extracted News Text data for Ten Companies

Sr.No. Company Name Day1 Day2 Day3 Day4 Day5

1. TCS 0.022394 0.020368 0.020368 0.02838 0.021074

2. BAJAJ 0.014308 0.018469 0.019952 0.019983 0.013024

3. HDFC 0.012351 0.014631 0.017785 0.017813 0.015408

4. GAIL 0.016667 0.020577 0.017583 0.01761 0.012584

5. MARUTI 0.014684 0.018811 0.016916 0.016942 0.013471

6. SUN 0.014229 0.018928 0.018431 0.018459 0.013728

7. ITC 0.013999 0.016857 0.015028 0.015052 0.01133

8. MAHINDRA 0.015635 0.01742 0.018172 0.0182 0.010805

9. LUPIN 0.015058 0.020531 0.018034 0.018062 0.013132

10. AIRTEL 0.01679 0.015671 0.016816 0.015695 0.011429

The prediction of closing price of a company [15] for next day is calculated by considering log likelihood values and TF_IDF values for current day as,

Where,

PrCPi+1 = Predicted value for i+1th day

TFIDFi= TF-IDF value for ith day

LgLiki =logliklihood value for ith day

CPi =closing price on ith day

Table 4 lists the results along with percentage error obtained from Proposed HMM model. The same dataset is used for prediction with KNN, Random forest boosting, Bagging, Boosting and multiple regression algorithms.

Dataset is divided in testing set and training set. The total values obtained for each of the algorithm are 60 for Top 10 companies listed in Bombay Stock Exchange.

Multiple Regression is an expansion of simple linear regression in which more than one independent variable is used to estimate a dependent variable. The computations are more difficult, however, because the interrelationships among all the variables must be taken into account.

The interpretation of the results of a multiple regression analysis is usually done with more than two independent variables. Table 5 lists the results along with percentage error obtained from Multiple Regression.

For KNN algorithm most importance is given for selection of the number of nearest neighbors i.e. determining the value of k plays important role in determining the usefulness of the model. Thus, selection of k will determine how well the data can be utilized to generalize the results of the algorithm.

By means of changing the k-value, more approximately accurate results can be contained. Table 6 lists the results along with percentage error obtained from KNN.

Table 4. Actual Closing, [15] Predicted Price and Percentage Error Using Proposed HMM model

Company Name Company Name Day1 Day2 Day3 Day4 Day5 Day6

TCS Actual Closing Price 2567.35 2576.75 2577.6 2570.65 2565.25 2538.8

Predicted Closing Price 2579.404 2567.536 2576.936 2577.859 2570.843 2570.843

Percentage Error 0.469508 0.35757 0.025775 0.280418 0.218011 1.051343

BAJAJ Actual Closing Price 2204.65 2244.15 2189.3 2269.2 2254.9 2251.75

Predicted Closing Price 2188.55 2204.778 2244.286 2189.44 2269.288 2255.022

Percentage Error 0.730263 1.754408 2.511594 3.514896 0.638078 0.145322

HDFC Actual Closing Price 1022.1 1015.75 1022.9 1027.75 1028.15 1000.6

Predicted Closing Price 1019.517 1022.415 1016.136 1023.284 1028.08 1028.488

Percentage Error 0.252713 0.656209 0.661274 0.434568 0.006773 2.787159

GAIL Actual Closing Price 281.05 282.1 280.9 289.8 296.25 292.85

Predicted Closing Price 272.95 282.31 283.17 281.98 290.54 297.33

Percentage Error 2.879384 0.075524 0.809994 2.697879 1.924343 1.530553

MARUTI Actual Closing Price 4199.45 4204.2 4200.05 4204.95 4167.5 4058.1

Predicted Closing Price 4250.35 4199.58 4204.32 4200.17 4205.048 4167.59

Percentage Error 1.212223 0.109692 0.101767 0.113567 0.900994 2.698182

SUN Actual Closing Price 371.85 372.45 380.65 375.7 373.5 364.6

Predicted Closing Price 376.38 372.70 373.28 381.46 376.31 374.13

Percentage Error 1.220352 0.069122 1.935224 1.53497 0.753819 2.614001

ITC Actual Closing Price 320.85 317.65 325.9 327.55 325.3 315.55

Predicted Closing Price 321.64 318.61 326.74 328.38 325.93 316.26

Percentage Error 0.246694 0.303081 0.25669 0.254505 0.194229 0.224636

MAHINDRA Actual Closing Price 1253.85 1215.3 1231.7 1232.75 1222.95 1176.8

Predicted Closing Price 1240.05 1254.07 1215.53 1231.93 1232.89 1223.09

Percentage Error 1.100438 3.190383 1.312031 0.065959 0.812831 3.933699

LUPIN Actual Closing Price 1829.45 1811.8 1898.5 1862.7 1928.85 1862.75

Predicted Closing Price 1802.74 1829.64 1811.97 1898.66 1862.82 1929.01

Percentage Error 1.460036 0.984503 4.558016 1.930463 3.423406 3.557254

AIRTEL Actual Closing Price 348.7 339.75 347.35 362.25 354.5 345.35

Predicted Closing Price 350.16 349.41 340.53 348.07 362.75 355.06

Percentage Error 0.41898 2.843761 1.962339 3.915651 2.327116 2.810257

In boosting approach uses the current model as input and fit a decision tree to the remaining data from the model. By means of use of remaining unfitted data results are improved further.  Table 7 lists the results along with percentage error obtained from boosting.

Bagging means bootstrap aggregation, increases the power of a prediction. It takes several random samples from training data set, and uses them to build a separate model and separate predictions for test set.

The prediction values are then averaged to get more accurate prediction. Table 8 lists the results along with percentage error obtained from bagging.

Random Forests are an improvement over bagged decision trees. Tree either fits or overfits. The training set errors cancel out, at least to some extent. Output is overfit, just compare the error on train and validation sets. Table 9 lists the results along with percentage error obtained from random forest boosting.

Top 10 companies stock prices in BSE for six days obtained from all algorithms is listed in Table 10 for comparison purpose. The percentage accuracy of each of these algorithms for prediction of either high or low stock price trends for total 60 values for top 10 companies is given in Table 11.  

Table 5. Actual Closing, Predicted Price and Percentage Error Using Multiple Regression model

Company Name Company Name Day1 Day2 Day3 Day4 Day5 Day6

TCS Actual Closing Price 2567.35 2576.75 2577.6 2570.65 2565.25 2538.8

Predicted Closing Price 2556.79 2560.53 2557.88 2550.35 2556.77 2521.04

Percentage Error 0.78 0.17 0.78 0.41 0.43 1.09

BAJAJ Actual Closing Price 2204.65 2244.15 2189.3 2269.2 2254.9 2251.75

Predicted Closing Price 2234.57 2235.41 2236.87 2234.98 2237.27 2237.11

Percentage Error 1.52 0.87 1.57 2.00 1.99 0.61

HDFC Actual Closing Price 1022.1 1015.75 1022.9 1027.75 1028.15 1000.6

Predicted Closing Price 1021.07 1018.28 1023.40 1023.64 1013.51 1014.67

Percentage Error 0.10 0.25 0.05 0.40 1.44 1.39

GAIL Actual Closing Price 281.05 282.1 280.9 289.8 296.25 292.85

Predicted Closing Price 281.04 282.23 285.77 285.09 296.21 292.45

Percentage Error 0.00 0.04 1.70 1.65 0.01 0.14

MARUTI Actual Closing Price 4199.45 4204.2 4200.05 4204.95 4167.5 4058.1

Predicted Closing Price 4183.31 4199.85 4165.57 4177.13 4149.08 4042.13

Percentage Error 0.39 0.10 0.83 0.67 0.44 0.40

SUN Actual Closing Price 371.85 372.45 380.65 375.7 373.5 364.6

Predicted Closing Price 373.83 378.05 376.30 377.18 373.38 366.73

Percentage Error 0.53 1.48 1.16 0.39 0.03 0.58

ITC Actual Closing Price 320.85 317.65 325.9 327.55 325.3 315.55

Predicted Closing Price 320.86 317.67 325.91 327.56 325.31 315.56

Percentage Error 0.00 0.01 0.00 0.00 0.00 0.00

MAHINDRA Actual Closing Price 1253.85 1215.3 1231.7 1232.75 1222.95 1176.8

Predicted Closing Price 1232.66 1224.78 1237.25 1221.95 1224.66 1173.04

Percentage Error 1.72 0.77 0.45 0.88 0.14 0.32

LUPIN Actual Closing Price 1829.45 1811.8 1898.5 1862.7 1928.85 1862.75

Predicted Closing Price 1840.42 1839.86 1853.12 1864.84 1933.60 1851.81

Percentage Error 0.60 1.53 2.45 0.11 0.25 0.59

AIRTEL Actual Closing Price 348.7 339.75 347.35 362.25 354.5 345.35

Predicted Closing Price 349.99 348.55 349.29 348.28 353.08 348.05

Percentage Error 0.37 2.52 0.56 4.01 0.40 0.78

Table 6. Actual Closing, Predicted Price and Percentage Error Using KNN

Company Name Company Name Day1 Day2 Day3 Day4 Day5 Day6

TCS

For k=4 Actual Closing Price 2567.35 2576.75 2577.6 2570.65 2565.25 2538.8

Predicted Closing Price 2567.35 2576.75 2576.75 2576.75 2579.2 2576.75

Percentage Error 0.36 0.46 0.04 0.62 0.44 1.10

BAJAJ

For k=5 Actual Closing Price 2204.65 2244.15 2189.3 2269.2 2254.9 2251.75

Predicted Closing Price 2204.65 2244.65 2204.65 2188.45 2244.45 2244.45

Percentage Error 0.00 0.00 0.70 3.69 0.47 0.33

HDFC

For k=4 Actual Closing Price 1022.1 1015.75 1022.9 1027.75 1028.15 1000.6

Predicted Closing Price 1022.1 1015.75 1019.25 1015.75 1015.75 1015.75

Percentage Error 0.00 0.00 0.36 1.18 1.22 1.49

GAIL

For k=4 Actual Closing Price 281.05 282.1 280.9 289.8 296.25 292.85

Predicted Closing Price 281.05 282.1 282.1 281.05 282.1 281.05

Percentage Error 0.00 0.00 0.43 3.11 5.02 4.20

MARUTI

For k=5 Actual Closing Price 4199.45 4204.2 4200.05 4204.95 4167.5 4058.1

Predicted Closing Price 4199.45 4204.2 4250.25 4204.2 4204.2 4199.45

Percentage Error 0.00 0.00 1.18 0.02 0.87 3.37

SUN

For k=6 Actual Closing Price 371.85 372.45 380.65 375.7 373.5 364.6

Predicted Closing Price 371.85 372.45 375.75 375.75 371.85 375.75

Percentage Error 0.00 0.00 1.30 0.01 0.44 2.97

ITC

For k=7 Actual Closing Price 320.85 317.65 325.9 327.55 325.3 315.55

Predicted Closing Price 320.86 317.67 325.91 327.56 325.31 315.56

Percentage Error 1.01 0.00 1.57 2.09 2.41 0.66

MAHINDRA

For k=5 Actual Closing Price 1253.85 1215.3 1231.7 1232.75 1222.95 1176.8

Predicted Closing Price 1253.85 1215.3 1253.85 1239.85 1239.85 1215.3

Percentage Error 0.00 0.00 1.77 0.57 1.36 3.17

LUPIN

For k=4 Actual Closing Price 1829.45 1811.8 1898.5 1862.7 1928.85 1862.75

Predicted Closing Price 1829.45 1811.8 1811.8 1811.8 1829.45 1811.8

Percentage Error 0.00 0.00 4.79 2.81 5.43 2.81

AIRTEL

For k=4 Actual Closing Price 348.7 339.75 347.35 362.25 354.5 345.35

Predicted Closing Price 348.7 339.75 349.4 349.4 339.75 349.4

Percentage Error 0.00 0.00 0.59 3.68 4.34 1.16

Table 7. Actual Closing, Predicted Price and Percentage Error Using Boosting

Company Name Company Name Day1 Day2 Day3 Day4 Day5 Day6

TCS Actual Closing Price 2567.35 2576.75 2577.6 2570.65 2565.25 2538.8

Predicted Closing Price 2571.94 2575.06 2575.82 2572.39 2575.06 2571.46

Percentage Error 0.18 0.40 0.08 0.45 0.28 0.90

BAJAJ Actual Closing Price 2204.65 2244.15 2189.3 2269.2 2254.9 2251.75

Predicted Closing Price 2238.77 2224.71 2224.31 2224.71 2210.89 2237.99

Percentage Error 1.52 0.87 1.57 2.00 1.99 0.61

HDFC Actual Closing Price 1022.1 1015.75 1022.9 1027.75 1028.15 1000.6

Predicted Closing Price 1023.15 1024.27 1024.27 1024.27 1024.86 1022.20

Percentage Error 0.10 0.83 0.13 0.34 0.32 2.11

GAIL Actual Closing Price 281.05 282.1 280.9 289.8 296.25 292.85

Predicted Closing Price 283.09 288.26 285.24 284.41 285.49 284.08

Percentage Error 0.72 2.14 1.52 1.89 3.77 3.09

MARUTI Actual Closing Price 4199.45 4204.2 4200.05 4204.95 4167.5 4058.1

Predicted Closing Price 4203.094 4199.911 4201.587 4201.358 4201.299 4203.236

Percentage Error 0.09 0.10 0.04 0.09 0.80 3.45

SUN Actual Closing Price 371.85 372.45 380.65 375.7 373.5 364.6

Predicted Closing Price 377.82 374.95 378.38 375.20 376.96 377.82

Percentage Error 1.58 0.67 0.60 0.13 0.92 3.50

ITC Actual Closing Price 320.85 317.65 325.9 327.55 325.3 315.55

Predicted Closing Price 324.8075 321.1813 326.0972 324.3357 326.0972 324.8075

Percentage Error 1.22 1.10 0.06 0.99 0.24 2.85

MAHINDRA Actual Closing Price 1253.85 1215.3 1231.7 1232.75 1222.95 1176.8

Predicted Closing Price 1238.12 1232.24 1232.98 1226.71 1238.35 1243.05

Percentage Error 1.27 1.37 0.10 0.49 1.24 5.33

LUPIN Actual Closing Price 1829.45 1811.8 1898.5 1862.7 1928.85 1862.75

Predicted Closing Price 1858.34 1864.11 1884.57 1880.52 1853.75 1850.86

Percentage Error 1.55 2.81 0.74 0.95 4.05 0.64

AIRTEL Actual Closing Price 348.7 339.75 347.35 362.25 354.5 345.35

Predicted Closing Price 345.65 348.00 350.75 347.07 349.19 354.79

Percentage Error 0.88 2.37 0.97 4.37 1.52 2.66

Table 8. Actual Closing, Predicted Price and Percentage Error Using Bagging

Company Name Company Name Day1 Day2 Day3 Day4 Day5 Day6

TCS Actual Closing Price 2567.35 2576.75 2577.6 2570.65 2565.25 2538.8

Predicted Closing Price 2576.612 2564.873 2577.817 2560.881 2567.821 2548.395

Percentage Error 0.36 0.46 0.01 0.38 0.10 0.38

BAJAJ Actual Closing Price 2204.65 2244.15 2189.3 2269.2 2254.9 2251.75

Predicted Closing Price 2234.192 2235.031 2236.495 2234.602 2236.896 2236.734

Percentage Error 1.32 0.41 2.11 1.55 0.80 0.67

HDFC Actual Closing Price 1022.1 1015.75 1022.9 1027.75 1028.15 1000.6

Predicted Closing Price 1019.980 1019.314 1026.051 1020.749 1015.606 1015.549

Percentage Error 0.21 0.35 0.31 0.69 1.24 1.47

GAIL Actual Closing Price 281.05 282.1 280.9 289.8 296.25 292.85

Predicted Closing Price 281.2412 281.8625 285.9134 296.0189 292.7065 285.2075

Percentage Error 0.07 0.08 1.75 2.10 1.21 2.68

MARUTI Actual Closing Price 4199.45 4204.2 4200.05 4204.95 4167.5 4058.1

Predicted Closing Price 4216.751 4217.949 4193.363 4188.420 4138.462 4079.305

Percentage Error 0.41 0.33 0.16 0.39 0.70 0.52

SUN Actual Closing Price 371.85 372.45 380.65 375.7 373.5 364.6

Predicted Closing Price 370.6448 375.8169 375.3107 377.1518 369.9913 369.8346

Percentage Error 0.33 0.90 1.42 0.38 0.95 1.42

ITC Actual Closing Price 320.85 317.65 325.9 327.55 325.3 315.55

Predicted Closing Price 320.85 317.65 325.90 327.55 325.30 315.55

Percentage Error 0 0 0 0 0 0

MAHINDRA Actual Closing Price 1253.85 1215.3 1231.7 1232.75 1222.95 1176.8

Predicted Closing Price 1225.617 1235.809 1230.407 1234.097 1204.759 1202.661

Percentage Error 2.30 1.66 0.11 0.11 1.51 2.15

LUPIN Actual Closing Price 1829.45 1811.8 1898.5 1862.7 1928.85 1862.75

Predicted Closing Price 1874.841 1826.025 1846.938 1868.404 1909.463 1868.378

Percentage Error 2.42 0.78 2.79 0.31 1.02 0.30

AIRTEL Actual Closing Price 348.7 339.75 347.35 362.25 354.5 345.35

Predicted Closing Price 350.1007 349.5446 347.7243 349.2192 351.4423 349.8688

Percentage Error 0.40 2.80 0.11 3.73 0.87 1.29

Table 9. Actual Closing, Predicted Price and Percentage Error Using Random Forest Boosting

Company Name Company Name Day1 Day2 Day3 Day4 Day5 Day6

TCS Actual Closing Price 2567.35 2576.75 2577.6 2570.65 2565.25 2538.8

Predicted Closing Price 2571.5 2574.094 2572.243 2573.53 2572.305 2573.647

Percentage Error 0.20 0.36 0.22 0.49 0.17 0.98

BAJAJ Actual Closing Price 2204.65 2244.15 2189.3 2269.2 2254.9 2251.75

Predicted Closing Price 2236.61 2224.711 2224.312 2224.711 2210.885 2236.633

Percentage Error 1.43 0.87 1.57 2.00 1.99 0.68

HDFC Actual Closing Price 1022.1 1015.75 1022.9 1027.75 1028.15 1000.6

Predicted Closing Price 1023.149 1024.272 1024.272 1024.272 1024.864 1022.2

Percentage Error 0.10 0.83 0.13 0.34 0.32 2.11

GAIL Actual Closing Price 281.05 282.1 280.9 289.8 296.25 292.85

Predicted Closing Price 283.03 288.26 285.24 284.41 285.49 283.79

Percentage Error 0.70 2.14 1.52 1.89 3.77 3.19

MARUTI Actual Closing Price 4199.45 4204.2 4200.05 4204.95 4167.5 4058.1

Predicted Closing Price 4203.378 4199.911 4201.51 4201.15 4201.299 4203.367

Percentage Error 0.09 0.10 0.03 0.09 0.80 3.46

SUN Actual Closing Price 371.85 372.45 380.65 375.7 373.5 364.6

Predicted Closing Price 377.823 374.9836 378.3836 375.222 376.9645 377.823

Percentage Error 1.58 0.68 0.60 0.13 0.92 3.50

ITC Actual Closing Price 320.85 317.65 325.9 327.55 325.3 315.55

Predicted Closing Price 325.1006 321.3464 326.0191 324.5468 326.0191 325.1006

Percentage Error 1.31 1.15 0.04 0.93 0.22 2.94

MAHINDRA Actual Closing Price 1253.85 1215.3 1231.7 1232.75 1222.95 1176.8

Predicted Closing Price 1238.12 1232.511 1233.323 1226.714 1238.351 1243.052

Percentage Error 1.27 1.40 0.13 0.49 1.24 5.33

LUPIN Actual Closing Price 1829.45 1811.8 1898.5 1862.7 1928.85 1862.75

Predicted Closing Price 1857.057 1859.562 1876.230 1873.200 1845.925 1856.710

Percentage Error 1.49 2.57 1.19 0.56 4.49 0.33

AIRTEL Actual Closing Price 348.7 339.75 347.35 362.25 354.5 345.35

Predicted Closing Price 346.008 347.780 350.50 347.199 349.058 354.906

Percentage Error 0.78 2.31 0.90 4.33 1.56 2.69

Table 10. Comparison of Values With all algorithms for Top 10 Companies  

Company Name Days Actual Closing Price Bagging Boosting Multiple Regression KNN Random Forest Proposed Method

TCS Day1 2567.35 2576.61 2571.94 2556.79 2567.35 2571.5 2579.40

Day2 2576.75 2564.87 2575.06 2560.53 2576.75 2574.09 2567.54

Day3 2577.6 2577.82 2575.82 2557.88 2576.75 2572.24 2576.94

Day4 2570.65 2560.88 2572.39 2550.35 2576.75 2573.53 2577.86

Day5 2565.25 2567.82 2575.06 2556.77 2579.2 2572.30 2570.84

Day6 2538.8 2548.39 2571.46 2521.04 2576.75 2573.65 2565.49

BAJAJ Day1 2204.65 2234.19 2238.77 2234.57 2204.65 2236.61 2188.55

Day2 2244.15 2235.03 2224.71 2235.41 2244.65 2224.71 2204.78

Day3 2189.3 2236.49 2224.31 2236.87 2204.65 2224.31 2244.29

Day4 2269.2 2234.60 2224.71 2234.98 2188.45 2224.71 2189.44

Day5 2254.9 2236.90 2210.89 2237.27 2244.45 2210.88 2269.29

Day6 2251.75 2236.73 2237.99 2237.11 2244.45 2236.63 2255.02

HDFC Day1 1022.1 1019.98 1023.15 1021.07 1022.1 1023.15 1019.52

Day2 1015.75 1019.31 1024.27 1018.28 1015.75 1024.27 1022.42

Day3 1022.9 1026.05 1024.27 1023.40 1019.25 1024.27 1016.14

Day4 1027.75 1020.75 1024.27 1023.64 1015.75 1024.27 1023.28

Day5 1028.15 1015.60 1024.86 1013.51 1015.75 1024.86 1028.08

Day6 1000.6 1015.55 1022.20 1014.67 1015.75 1022.2 272.96

GAIL Day1 281.05 281.24 283.09 281.04 281.05 283.03 282.31

Day2 282.1 281.86 288.26 282.23 282.1 288.26 283.18

Day3 280.9 285.91 285.24 285.77 282.1 285.24 281.98

Day4 289.8 296.02 284.41 285.09 281.05 284.41 290.55

Day5 296.25 292.71 285.49 296.21 282.1 285.49 297.33

Day6 292.85 285.21 284.08 292.45 281.05 283.79 272.96

MARUTI Day1 4199.45 4216.75 4203.09 4183.31 4199.45 4203.38 4250.36

Day2 4204.2 4217.95 4199.91 4199.85 4204.2 4199.91 4199.59

Day3 4200.05 4193.36 4201.59 4165.57 4250.25 4201.51 4204.32

Day4 4204.95 4188.42 4201.36 4177.13 4204.2 4201.15 4200.17

Day5 4167.5 4138.46 4201.30 4149.08 4204.2 4201.30 4205.05

Day6 4058.1 4079.30 4203.24 4042.13 4199.45 4203.37 4167.59

Company Name Days Actual Closing Price Bagging Boosting Multiple Regression KNN Random Forest Proposed Method

SUN Day1 371.85 370.64 377.82 373.83 371.85 377.82 376.39

Day2 372.45 375.82 374.95 378.05 372.45 374.98 372.71

Day3 380.65 375.31 378.38 376.30 375.75 378.38 373.28

Day4 375.7 377.15 375.20 377.18 375.75 375.22 381.47

Day5 373.5 369.99 376.96 373.38 371.85 376.96 376.32

Day6 364.6 369.83 377.82 366.73 375.75 377.82 374.13

ITC Day1 320.85 320.85 324.81 320.86 320.86 325.10 321.64

Day2 317.65 317.65 321.18 317.67 317.67 321.35 318.61

Day3 325.9 325.90 326.09 325.91 325.91 326.02 326.74

Day4 327.55 327.55 324.33 327.56 327.56 324.55 328.38

Day5 325.3 325.30 326.09 325.31 325.31 326.02 325.93

Day6 315.55 315.55 324.80 315.56 315.56 325.10 316.26

MAHINDRA Day1 1253.85 1225.617 1238.12 1232.66 1253.85 1238.12 1240.05

Day2 1215.3 1235.809 1232.24 1224.78 1215.3 1232.51 1254.07

Day3 1231.7 1230.407 1232.98 1237.25 1253.85 1233.32 1215.54

Day4 1232.75 1234.097 1226.71 1221.95 1239.85 1226.71 1231.94

Day5 1222.95 1204.759 1238.35 1224.66 1239.85 1238.35 1232.89

Day6 1176.8 1202.661 1243.05 1173.04 1215.3 1243.05 1223.09

LUPIN Day1 1829.45 1874.841 1858.34 1840.42 1829.45 1857.06 1802.74

Day2 1811.8 1826.025 1864.11 1839.86 1811.8 1859.56 1829.64

Day3 1898.5 1846.938 1884.57 1853.12 1811.8 1876.23 1811.97

Day4 1862.7 1868.404 1880.52 1864.84 1811.8 1873.20 1898.66

Day5 1928.85 1909.463 1853.75 1933.60 1829.45 1845.92 1862.82

Day6 1862.75 1868.378 1850.86 1851.81 1811.8 1856.71 1929.01

AIRTEL Day1 348.7 350.1007 345.65 349.99 348.7 346.00 350.16

Day2 339.75 349.5446 348.00 348.55 339.75 347.78 349.41

Day3 347.35 347.7243 350.75 349.29 349.4 350.50 340.53

Day4 362.25 349.2192 347.07 348.28 349.4 347.20 348.07

Day5 354.5 351.4423 349.19 353.08 339.75 349.06 362.75

Day6 345.35 349.8688 354.79 348.05 349.4 354.90 355.06

Table 11. Accuracy percentage  of  all algorithms

   

Sr.No Algorithm Name Total Values Correct Trend high/low Accuracy %

1. Bagging 60 34 56.66

2. Boosting 60 26 43.33

3. Multiple Regression 60 40 66.66

4. KNN 60 36 60

5. Random forest 60 20 33.33

6. Proposed Method 60 42 70

CONCLUSION AND FUTURE DIRECTION

At preliminary stage results such as high and low trends for stock price values are approximately matching to 70% with proposed HMM model. The results with Multiple Regression and KNN are also more than 50%.Due to dynamic nature of  stock market movement the accuracy of 100 % is almost impossible. The proposed method shows more accurate results in comparison with other predictive methods. The error rate obtained is in the range of 0.2 to 3.9 % for weekly collected stock data.

In future data collection for news can be increased for use of deep learning ensemble methods for increasing accuracy percentage. The feature extraction can be improved with use of PCA or K-means or other dimension reduction techniques. Neural Network for faster results can be designed.

References:

1. Feldman, R. and Sanger, J., 2007. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press.

2. Schumaker, R.P. and Chen, H., 2009. Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Transactions on Information Systems (TOIS), 27(2), p.12.

3. Thirumuruganathan, S., 2013. A Detailed Introduction to K-Nearest Neighbor (KNN) Algorithm (May 2013).

4. Imandoust, S.B. and Bolandraftar, M., 2013. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications,3(5), pp.605-610.

5. Biau, G., 2012. Analysis of a random forests model. Journal of Machine Learning Research, 13(Apr), pp.1063-1095.

6. Montillo, A. and Ling, H., 2009, November. Age regression from faces using random forests. In 2009 16th IEEE International Conference on Image Processing (ICIP) (pp. 2465-2468). IEEE.

7. Quinlan, J.R., 1996, August. Bagging, boosting, and C4. 5. In AAAI/IAAI, Vol. 1 (pp. 725-730).

8. Schapire, R.E., 2003. The boosting approach to machine learning: An overview. In Nonlinear estimation and classification (pp. 149-171). Springer New York.

9. Kuiper, S., 2008. Introduction to Multiple Regression: How Much Is Your Car Worth?. Journal of Statistics Education, 16(3).

10. Annau, M., 2015. Short Introduction to tm. plugin. webmining.

11. Nagar, A. and Hahsler, M., 2012. Using text and data mining techniques to extract stock market sentiment from live news streams. In 2012 International Conference on Computer Technology and Science (Vol. 47, pp. 91-95).

12. Wu, H.C., Luk, R.W.P., Wong, K.F. and Kwok, K.L., 2008. Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS), 26(3), p.13.

13. Ingle, V. and Deshmukh, S., 2016, August. Hidden Markov Model Implementation for Prediction of Stock Prices with TF-IDF features. In Proceedings of the International Conference on Advances in Information Communication Technology & Computing (p. 9). ACM.

14. Fallon, J., Making Profit in the Stock Market Using HMMs. www.cs.uml.edu/ ecg/ uploads /AIfall12/ jfallon_hmm_stock.pdf accessed on 26th May  2016.

15. Zhang, Y., 2004. Prediction of financial time series with Hidden Markov Models (Doctoral dissertation, Simon Fraser University).

...(download the rest of the essay above)

About this essay:

This essay was submitted to us by a student in order to help you with your studies.

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, . Available from:< https://www.essaysauce.com/essays/engineering/2017-1-19-1484827270.php > [Accessed 17.10.19].