Search for an essay or resource:

Essay: Data Preprocessing

Essay details:

  • Subject area(s): Information technology essays
  • Reading time: 3 minutes
  • Price: Free download
  • Published: February 14, 2016*
  • File format: Text
  • Words: 707 (approx)
  • Number of pages: 3 (approx)
  • Data Preprocessing
    0.0 rating based on 12,345 ratings
    Overall rating: 0 out of 5 based on 0 reviews.

Text preview of this essay:

This page of the essay has 707 words. Download the full version above.

In the process of rough set data analysis, attributes can be reduced, which implies that some redundant attributes that do not play any role in distinguishing an object from the others, can be eliminated without any information loss. And the final result to which rough set approaches direct is the production rule that is capable of predicting newly gathered data [15].
b. Completion
The RST done the process of fill the missing value using
i/o completer (mean / mode fill value). For large datasets with missing values, complicated methods are not suitable because of their high computation cost. It tends to and simple methods that can reach performance as good as complicated ones. The results and experience obtained in the previous session suggested us that mean-and-mode method can be efficient and effective for large datasets with necessary improvements. The basic idea of our method is the cluster-based filling up of missing values (4). Instead of using mean-and-mode on the whole dataset will use mean-and-mode in its subsets obtained by clustering. In this algorithm can be applied to supervised data where missing value attributes can be either categorical or numeric. It produces a number of clusters equal to the number of values of the class attribute. By using this method the missing data will be filled by the comparison of other inputs and the filed data 90% suitable to that column [10].
c. Reduction
 After analyze all the data, the Rosetta tool provide the influential and important parameter those are decide the result. It built so much of combination related to the end result. By using the Johnson’s reduction algorithm produce highly reliable reduction data that have high influential parameters [11]. In the area of Reduction the Johnson Reducer (Johnson’s algorithm) used to find out the influential parameter for the highly impacted data for the future selection. It is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents [14]. In most of the cases this activity concerns processing human language texts by means of Natural Language Processing (NLP). Johnson’s algorithm is the task where it intends to reduce the dataset dimension by analyzing and understanding the impact of its features on a model [9]. Consider for example a predictive model C1A1 + C2A2 + C3A3 = S, where Ci are constants, Ai are features and S is the predictor output. It is interesting to understand how important are the used features (A1, A2 and A3) what are their relevance to the model and their correlation with S. Such analysis allows us to select a subset of the original features, reducing the dimension and complexity of future steps on the Data Mining process (4).
So this influenced parameter given as an input to the ANN based tool for prediction. A Neuron solution is one of the best simulation tools for ANN.
2) Prediction
In the area of prediction, the parameters are labeled by
Training, Testing and cross validation (7).
 In the training the network trained by using the influential parameters and that will compress to the level how to parameter give success and n success. It may train and it will check in the testing section.
b. Testing
 In the testing, the trained data may check by using
Supervised learning algorithm. If the testing suitable to give correct result means the data trained correctly otherwise the data will be train again [11].
c. Cross validation
If the training and testing are done correctly means the data will be validated using cross validation section [13].
3) Error rate. 
In the paper also shows the error rate between the Actual and desired output. If the error rate is low, then only we consider the system works correctly [13].

A properly trained neural network is capable of generating the information on the based on IVF data. To train an artificial neural network, a suitable training, cross validation and test data are selected. The neural network is trained with the training data, and checked with test data. The ANN will find the desired output-actual output map from the training set [6].

About Essay Sauce

Essay Sauce is the free student essay website for college and university students. We've got thousands of real essay examples for you to use as inspiration for your own work, all free to access and download.

...(download the rest of the essay above)

About this essay:

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, Data Preprocessing. Available from:<> [Accessed 25-01-22].

These Information technology essays have been submitted to us by students in order to help you with your studies.

* This essay may have been previously published on at an earlier date.

Review this essay:

Please note that the above text is only a preview of this essay.

Review Content

Latest reviews: