Essay details:

  • Subject area(s): Engineering
  • Price: Free download
  • Published on: 7th September 2019
  • File format: Text
  • Number of pages: 2

Text preview of this essay:

This page is a preview - download the full version of this essay above.

Energy and Priority Aware Map reducing in Big data for Large Data set in Classification


Faculty Of Computing,

Sathyabama University,

Chennai, India

[email protected]

A.Akshay Kumar Malani

Student,Faculty Of Computing,

Sathyabama University,

Chennai, India

[email protected]

Andena Prashant Reddy

Student,Faculty Of Computing,

Sathyabama University,

Chennai, India

[email protected]


In recent years the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to reducing number of maps while data processing time for analyzing. It avoids number of computational task. Previously more studies involved on MapReducing in Bigdata. Major challenges are job scheduling, Energy Map Reduce Scheduling Algorithm, a novel incremental processing extension to MapReduce, and the most widely used framework for mining big data. Map reduce is a methodology, for processing and generating large amount of data in parallel time. In this paper, EPAMR is algorithm provide more energy and less maps. Priority based scheduling is a task will allocate the schedules based on necessary and utilization of the Jobs. For reducing the maps, it will reduce the system work so easily energy has improved. Final results show the experimental comparison of the different algorithms involved in the paper.

Key words: EPAMR, MapReduce, Bigdata, Scheduling.


Big Data is a emerging technology, however poorly outlined selling nonsensicality. A method of viewing massive information is that it represents the big and quickly growing volume of data that's largely untapped by existing analytical applications and data storage systems. samples of this information embody high-volume device information and social networking info from websites like Face Book and Twitter. Organizations have an interest in capturing and analyzing this information as a result of it will add important price to the choice creating method. Such process, however, could involve advanced workloads that push the boundaries of what are attainable exploitation ancient information storage and information management techniques and technologies.

MapReduce is a methodology to processing and generating huge data sets with a parallel processing on a cluster. It divided by two parts named as map and reduces. Map is the procedure mainly for sorting and filtering. It contains key and value pair (kv pairs). Reduce is the procedure to collect  and add relevant and nearby queries as a single entity.


Fig A: MapReduce

Hadoop is a software framework which helps to implement distributed computing using MapReduce. As the quantity of data especially unstructured data collected by organizations and enterprises explodes, Hadoop is rising quickly united of the first choices for storing and activity operations on it data. A comment from Hadoop:

Hadoop appears a natural one. Each square measure open supply comes and each square measure data driven. However there square measure some elementary challenges that require to be addressed so as to make the wedding work. Revolution Analytics is addressing these challenges with its Hadoop based development.

Iterative vs. execution - If we glance at however most of the people do analytics, it's typically AN interactive method. begin with a hypothesis, explore and check out to grasp the information, try some different applied mathematics techniques, drill down on numerous dimensions, etc., and a perfect atmosphere for activity such analysis. Hadoop on the opposite hand, is batch adjusted wherever jobs square measure queued so dead, and it should take minutes or hours to run these jobs.

In-memory vs. in parallel - Another elementary challenge is that R is meant to own all of its data in memory and programs in Hadoop (map/reduce) work severally and in parallel on individual data slices.

Our Contribution

      In our work proposed Map Reduce to efficiently support iterative computation on the Map Reduce platform. In comparison, our current proposal provides general purpose support, including not only one-to-one, but also one-to-many, many-to-one, and many-to-many correspondence. For scheduling the task, here we will apply priority based task scheduling. Lets take key/value pairs and added in a list, finally the reduce takes the sums into one and produce single output. By using Map Reduce utility of the system will be less comparing to previous works. Energy Aware scheduling will decrease the energy consumption ratio.

The proposed scheme is able to incorporate flow correlation information in to the classification process. which is an effective probabilistic classifier employing the Bayes’ theorem with naive feature independence assumptions. Furthermore, for each communication process, both the source and the destination are not malicious. NB classifier is that it only requires a small amount of training data to estimate the parameters of a classification model.

       A support vector machine (SVM) is a methodology in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis. The standard SVM takes a set of input data and predicts, for a group of connected supervised learning ways that analyze information and acknowledge patterns, used for classification and multivariate analysis. the quality SVM takes a group of input file and predicts, for every given input, that of 2 attainable categories contains the input, creating the SVM a non-probabilistic binary linear classifier. Given a list of coaching examples, every marked as happiness to at least one of 2 classes, AN SVM coaching algorithmic program builds a model that set new examples into one class or the opposite. AN SVM model may be a illustration of the examples as points in area, mapped so the samples of the separate classes area unit divided by a transparent gap that's as wide as attainable. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

Fig B: Proposed Architecture.

Previous Study

Resource allocation and analytics in data centers and clouds. Hacker and Mahadik [1] projected programing polices for virtually produce high performance computing clusters. They given a resource prediction model for every policy to estimate the resources required inside a cloud, the queue wait time for requests, and therefore the size of the pool of spare resources needed. Palanisamy et al. [2] projected a replacement MapReduce cloud service model for production jobs. The method created for cluster configurations for the roles exploitation MapReduce identification and leverages deadline-awareness, allowing the cloud supplier to optimize its world resource allocation and scale back the price of resource provisioning.

programming model And an design to boost MapReduce runtime that supports unvarying MapReduce computations with efficiency. They showed however their projected model may be extended to a lot of categories of applications for MapReduce. Tian and Chen [4] projected a price operate that models the relationship between the quantity of input file, Map and scale back slots, and therefore the complexness of the scale back operate for the MapReduce job. Their projected value operate can be wont to minimize the price with a time point or minimize the time beneath sure budget. Zhan et al. [5] proposed a cooperative resource provisioning answer using applied mathematics multiplexing to avoid wasting the server value. Song et al. [6] projected a two-tiered on-demand resource allocation mechanism consisting of the native and world resource allocation. In our previous studies [7], [8], [9]. [10], we projected mechanisms for resource provisioning, allocation, and valuation in clouds considering many heterogeneous resources.

System Methodologies

I. Data Collection

In this stage, data set consists of large number of files 2,00,000 instance from German bank credits system. It contains all the information clients in that particular bank, for example Loan details, Vehicle details, education status, Credit & Debit Statements etc.,

II. MapReduce

MapReduce back may be a programming model associate degreed an associated implementation for process and generating giant information sets. Users specify a map operate that processes a key/value combine to come up with a group of intermediate key/value pairs, and a reduce method that merges all relevant values related to identical intermediate key. Several globe tasks area unit speakable during this model, as shown within the paper. Programs written during this practical vogue area unit mechanically parallelized and dead on an outsized cluster of artifact machines. The run-time system will care small print of partitioning the data set, programing the program’s execution across a group of machines; also it navigates the data from big data sets. This permits programmers with parallel and distributed systems to simply utilize the resources of an outsized distributed system. : An custom MapReduce computation processes several terabytes of information on thousands of machines; here it will reduce number of mappings for classification.

Fig C: Reducing Map

III. Energy & Priority Aware Scheduling

This paper also focuses on scheduling issues. Before allocating a task, system will validate the Job priority allocation. If a queue get high priority data it will act as pre-emption condition, low priority data will act as Non Pre-emption condition.

Fig B: Data flow

IV. Classification

Final studies of this project are applying different classification algorithms. Here Naive bayes and SVM are the classification algorithms participated for comparing the final result such as elapsed time and efficiency.


1.Mapper: Identity function for value

       (k,v)  (v,_)

2.Reducer Identity function

      (k’,_) (k’.””)

Create an empty priqueue Q1, Q2

If(task == pritask)


     add Q1;

) else {

   add Q2;


Q1  Priority Queue

Q2  Non Priority Queue

Result Analysis

Creating Bigdata environment

Upload Dataset

Classification Result


The main advantages offered by Map Reduce, it will provide less computational power as well as high speed for data accessing (offered by replicating all data on multiple Data Nodes and other mechanism to protect from failure), the scheduler’s ability grouping the jobs and the data offering high throughput for data for the jobs processed on the grid. Adding the ease of use, less maintenance and scalability combining these two technologies seems like a better choice. By implementing a Hadoop, we take advantage of the reduced maps, the Hadoop scheduler’s abilities to send jobs where the needed data is located (when possible). Big data provides high performance data processing service with the help of the interconnected computers connected to each other through local area network or through internet. It uses parallel processing and independent systems technology which are the backbone of high performance computing. It has extensive use in many fields of science and engineering.  Also we compared map reduce with different navigation algorithms named as SVM and naive Bayes.

...(download the rest of the essay above)

About this essay:

This essay was submitted to us by a student in order to help you with your studies.

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, . Available from:< > [Accessed 06.04.20].