Search for an essay or resource:

Essay: Convert web documents to clustered documents (outline)

Essay details:

  • Subject area(s): Information technology essays
  • Reading time: 3 minutes
  • Price: Free download
  • Published: December 9, 2021*
  • File format: Text
  • Words: 725 (approx)
  • Number of pages: 3 (approx)
  • Convert web documents to clustered documents (outline)
    0.0 rating based on 12,345 ratings
    Overall rating: 0 out of 5 based on 0 reviews.

Text preview of this essay:

This page of the essay has 725 words. Download the full version above.

The volume of data in digital world is growing increasingly, which has badly impact on forensic analysis. So there is a need to find the quick method that can group the required documents. Numbers of algorithms like k-mean, agglomerative clustering are used for clustering purpose. So, system is pre-process unstructured format to structured format, then extract 4 important features of each document like numeric words, proper nouns title sentences and term weights. This makes it much simpler than any other methods. Then system neglecting unwanted extension’s considering only extensions which are rich in text like .pdf, .doc, .txt. As the final step of clustering, system creates a score matrix of all the documents by comparing with one another to yield a score matrix which contains aggregate feature score. The grouping of these scored values represents the most accurate clustered documents.

This system first creates an interactive web crawler which eventually parses the web pages and collects the data and saves in .txt file format. Then the folder in which these web data is stored is given as the input to the system which then preprocess this data to extract the features And then fuzzy logic is applied to get the feature scores classification pattern and then this is feed to the weighted matrix method to create semantic clusters for the web page documents.

Figure1: Overall System Diagram

Main aim is to convert many web documents to clustered documents. In this web document cluster contains web crawler, data preprocessing, feature extraction and weighted score matrix. Web crawler contains many web pages that will be converted into clustered information. In data preprocessing contains special symbol removing, stop word removing, stemming. Feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative, non redundant, facilitating the subsequent learning and generalization steps, in some cases leading to better human interpretations. Feature extraction is related to dimensionality reduction. When the input data to an algorithm is too large to be processed and it is suspected to be redundant (e.g. the same measurement in both feet and meters, or the repetitiveness of images presented as pixels), then it can be transformed into a reduced set of features (also named a “features vector”). This process is called feature extraction. The extracted features are expected to contain the relevant information from the input data, so that the desired task can be performed by using this reduced representation instead of the complete initial data.

Fuzzy logic can be used as an interpretation model for the properties of neural networks, as well as for giving a more precise description of their performance. We will show that fuzzy operators can be conceived as generalized output functions of computing units. Fuzzy logic can also be used to specify networks directly without having to apply a learning algorithm. An expert in a certain field can sometimes produce a simple set of control rules for a dynamical system with less effort than the work involved in training a neural network.

Weighted score matrix used to define the level of importance of criteria. Assigning meaning to weighting factors is subjective. For this reasons, keep the number of weighting factors small.

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, obdurate dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem.


About Essay Sauce

Essay Sauce is the free student essay website for college and university students. We've got thousands of real essay examples for you to use as inspiration for your own work, all free to access and download.

...(download the rest of the essay above)

About this essay:

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, Convert web documents to clustered documents (outline). Available from:<> [Accessed 25-01-22].

These Information technology essays have been submitted to us by students in order to help you with your studies.

* This essay may have been previously published on at an earlier date.

Review this essay:

Please note that the above text is only a preview of this essay.

Review Content

Latest reviews: