Essay:

Essay details:

  • Subject area(s): Engineering
  • Price: Free download
  • Published on: 7th September 2019
  • File format: Text
  • Number of pages: 2

Text preview of this essay:

This page is a preview - download the full version of this essay above.

Computing and communication are the two key factors those continuously keep on affecting on the way we run business, the way we learn, and the way we live. The unbelievable advancements in the computer technology have tremendously affected the way of storing data, its growth, service workload and the application complexities. These days, the expense of maintaining and overseeing the licensed or non licensed software and bulky hardware devices ranges from two to ten times of their purchasing. We see an expanding request on innovations for shifting administrative load from people to programming. If we uncover the market trends, data migration and application migration, are among very rigorously used technologies for the computation and managing of data storage towards the automation and supervising oneself.

In this thesis we scrutinized key issues in the development and designing of scalable architectures and procedure for proficient and effective data migration and application migration. We started by scrutinizing the prospect of data migration automatically across multi – tier storage frameworks. The considerable IO progress in Solid State Disks (SSD) over conventional rotational hard disks (HDD) encouraged the integration of SSD into existing storage hierarchy for better performance. We created versatile look-ahead data migration approach to proficiently incorporate SSD into the multi-layered

Rakhi Chawla, Ph.D Reg No.1250104657 5

storage architecture. The major functionality, while storing the low temperature data (cold data) in HDD tier and placing the high temperature data (hot data) into fast and expensive SSD tier, is to deal with the migration of data as their access patterns are shifted from cold to hot and vice versa. For instance, workloads while day time in retail or banking applications can be drastically dissimilar as in evening hours. We outlined and actualized a adaptive lookahead data migration model. An exceptional peculiarity of our mechanized migartion methodology is its capacity to rapidly adjust the data migration program to accomplish the optimal migration adequacy by considering application particular attributes and I/O profiles and additionally workload cut-off dates. Our research over the actual framework demonstrates that the essential look-ahead data migartion model is successful in enhancing and improving the productivity and scalability of multi-tier storage systems. The second principle commitment we have made in this research work is to address confront of reliability and load balancing among computing nodes of network, handled in decentralized service of computer system.

Considering, providing services based on location to mobile users who are geographically distributed, the nonstop and gigantic service request workloads posture considerable technical confront for the framework to ensure scalable and reliable service condition. We planed and build up a decentralized architecture for computing, called Reliable GeoGrid, with two remarkable gimmicks. First and foremost, we build up a scheme with migration of workload and controlled replication, which uses an easy route based optimization to expand the strength of the structure against various

Rakhi Chawla, Ph.D Reg No.1250104657 6

failures of nodes and partitioning of network failure as well. Second, we devise a vibrant technique for load balancing to scale the framework in expectation of sudden workload changes. Our tests demonstrate that the Reliable GeoGrid architecture is very scalable under changing service workloads with moving hotspots and profoundly reliable in the enormous failure of nodes. The third research push in this thesis exploration is centered on mulling over the procedure of moving applications from local physical servers to Cloud. We outline relocation trials and study the bugs and further form the error model. In view of the investigation and perceptions in migration tests, we propose the Cloud-Mig framework which gives both configuration approval and establishment computerization which viably lessens the configuration bugs and initial complications.

1.2 RESEARCH QUESTIONS

Considering the above problem statement we can now define the main research question. The main research question is

RSQ : How can data migration be tailored to the specific situation of cloud companies?

We analyze each aspect of the research question to define sub-questions aimed at giving a clearer understanding at what the scope of the research is and what the deliverables are. First of all, the main issue is how to undertake a data migration project. For this, literature on this topic needs to be studied

Rakhi Chawla, Ph.D Reg No.1250104657 7

and phases and activities for methods that offer solutions to data migration should be identified. Thus the first sub-question is:

RSQ1: What are the main data migration methods that appear in scientific literature?

RSQ1 is answered by performing a literature study to build a theoretical landscape of data migration. The definition of data migration, main characteristics of data migration projects and various methods of undertaking, both technical and process oriented, are identified. The literature study has two objectives: first, it grounds the research in theory, and second, it helps to identify candidate methods for developing a software product data migration method. The method is developed through situational method engineering (Brinkkemper, 1996) using the method assembly technique (Ralyté, Deneckère & Rolland, 2003). Based on this we can define the second sub-question as:

RSQ2: What are the main migration tasks involved and available tool supports? How can a method for managing data migration for cloud companies be developed?

RSQ2 is answered by following the steps of cloud based method engineering. In first stage requirements for data migration are defined. Further, candidate methods are identified from the literature study using the requirements. Then, the methods are compared and a situational method is build using fragments that are relevant to cloud companies for data migration. Having a method

Rakhi Chawla, Ph.D Reg No.1250104657 8

developed there is an issue of the validity of the method in practice and the evaluation of the method. Thus, the third sub-question is:

RSQ3: How can a method for managing data migration for cloud companies be implemented in practice?

In order to answer the last question, expert interviews are performed in order to assess the validity of the research. Experts are asked to rank the method on different quality measures and to provide feedback on the developed method. In addition, a case study is performed at the software product company that triggered the research. In the case study the focus is on the implementation of the method in the company and on relating the method to the software product development process. Finally, the findings of the interviews and case study are used to refine the method and to finalize the method.

Rakhi Chawla, Ph.D Reg No.1250104657 9

CHAPTER 2. RESEARCH APPROACH

This chapter describes the research approach used to answer the research questions presented in Chapter 1. The main research method used is design science in information systems research. The research is split into several activities, each conforming to a complementary research method. The methods are used to produce a final deliverable in the form of software product data migration method. The method has both scientific and social contributions. In science, it fills the gap of data migration for software product companies. As a social contribution, the method can yield more efficient processes which in turn have beneficial influence over human resources.

2.1 DESIGN SCIENCE RESEARCH

This research follows design science in information systems research, as suggested by Henver et al. (2004). The suitability of this research approach comes from the fact that design science in Information systems research aims at creating and evaluating IT artifacts with the intent to solve organizational problems. Synonymously, March and Smith (1995) also use the terminology of “build” and “evaluate” as the processes of design science. Therefore, design science assists in building an artifact and in evaluating that artifact. This research is aims at building a method for software product data migration, the artifact, which is evaluated for purposes of validity and applicability.

Rakhi Chawla, Ph.D Reg No.1250104657 10

There are four types of design artifacts: constructs, models, methods and instantiations (March & Smith, 1995). Within the scope of this research, models and methods are of primary interest. Models facilitate understanding and present the relationship between a problem and its solution. Methods represent processes and they offer guidance on how problems can be solved. The main deliverable of this research is aimed at offering a better understanding of the data migration process, as well as offering guidance in performing data migration for software product companies.

Figure 0 presents the Hevner at al. (2004) Information Science Research Framework that this thesis follows as a basis for conducting the research. On this framework the three research sub-questions are overlaid for understanding in which part of the process they are answered. The main trigger for this research came from a Dutch software product company, AFAS Software B.V. This represents the Organizations concept in the Environment side of the framework. AFAS is in need of a solution for data migration from an existing version of its product to a re-developed version of the same product. This represents the Business Need which gives the research Relevance . Based on this business need the main research question and the three sub-questions are developed. To answer the first sub-question (RSQ1), existing literature on data migration is studied. This represents the Knowledge Base from which Methods , Models and Frameworks are chosen as Applicable Knowledge . To answer the second sub-question (RSQ2), the literature study is used as a basis for developing a method for software product data migration. This represents Developing an Artifact. To answer

Rakhi Chawla, Ph.D Reg No.1250104657 11

the third sub- question (RSQ3), the method is evaluated through expert interviews and case study. Finally, the results of the evaluation are used to refine the method. This represents Evaluation through Analytical and Case Study means in the IS Research Framework. The findings of this research are, in the end, an addition to the IS knowledge base and are applicable in the appropriate environment.

Figure 0 - Information Systems Research Framework (Hevner et al., 2004)

Rakhi Chawla, Ph.D Reg No.1250104657 12

2.2 RESEARCH PLANNING

This section describes the phases that are part of the research approach. Each phase is performed using a research method. Table 0 presents the research questions and the research methods and deliverables that are used to answer them.

Table 0- Research questions, methods and deliverables

Research Research Method Research Deliverable

Question

RSQ1 Literature Study Data Migration

Methods

RSQ2 Cloud Based Method Preliminary Situational

Engineering Data Migration

Method

RSQ3 Experiments Experiments Findings

Case Study Case Study Findings

RQ Design Science Situational Data

Migration Method

The ordering in the table is according to the chronological order of each stage of the research. In a first stage, Data Migration Theory, Data Migration Phases and Data Migration Activities are identified using the Literature Study method. The following stage consists of analysis of the theoretical findings. A

Rakhi Chawla, Ph.D Reg No.1250104657 13

Preliminary Situational Data Migration Method is developed using Assembly Based Method Engineering. The method is evaluated through Experiments and a Case Study at a software product company. Further, the method is analyzed and refined taking into account the Experiments Findings and Case Study Findings. Thus, the final Situational Data Migration Method is developed. As a final stage not presented in the table, Conclusion of the research process deals with correctly reporting the findings and addressing any research issues and further research.

2.3 RESEARCH METHODS

Computing and communication are the two key factors those continuously keep on affecting on the way we run business, the way we learn, and the way we live. The unbelievable advancements in the computer technology have tremendously affected the way of storing data, its growth, service workload and the application complexities. These days, the expense of maintaining and overseeing the licensed or non licensed software and bulky hardware devices ranges from two to ten times of their purchasing. We see an expanding request on innovations for shifting administrative load from people to programming. If we uncover the market trends, data migration and application migration, are among very rigorously used technologies for the computation and managing of data storage towards the automation and supervising oneself.

Rakhi Chawla, Ph.D Reg No.1250104657 14

In this thesis we scrutinized key issues in the development and designing of scalable architectures and procedure for proficient and effective data migration and application migration. We started by scrutinizing the prospect of data migration automatically across multi – tier storage frameworks. The considerable IO progress in Solid State Disks (SSD) over conventional rotational hard disks (HDD) encouraged the integration of SSD into existing storage hierarchy for better performance. We created versatile look-ahead data migration approach to proficiently incorporate SSD into the multi-layered storage architecture. The major functionality, while storing the low temperature data (cold data) in HDD tier and placing the high temperature data (hot data) into fast and expensive SSD tier, is to deal with the migration of data as their access patterns are shifted from cold to hot and vice versa. For instance, workloads while day time in retail or banking applications can be drastically dissimilar as in evening hours. We outlined and actualized a adaptive lookahead data migration model. An exceptional peculiarity of our mechanized migartion methodology is its capacity to rapidly adjust the data migration program to accomplish the optimal migration adequacy by considering application particular attributes and I/O profiles and additionally workload cut-off dates. Our research over the actual framework demonstrates that the essential look-ahead data migartion model is successful in enhancing and improving the productivity and scalability of multi-tier storage systems. The second principle commitment we have made in this research work is to address confront of reliability and load balancing among computing nodes of network, handled in decentralized service of computer system.

Rakhi Chawla, Ph.D Reg No.1250104657 15

Considering, providing services based on location to mobile users who are geographically distributed, the nonstop and gigantic service request workloads posture considerable technical confront for the framework to ensure scalable and reliable service condition. We planed and build up a decentralized architecture for computing, called Reliable GeoGrid, with two remarkable gimmicks. First and foremost, we build up a scheme with migration of workload and controlled replication, which uses an easy route based optimization to expand the strength of the structure against various failures of nodes and partitioning of network failure as well. Second, we devise a vibrant technique for load balancing to scale the framework in expectation of sudden workload changes. Our tests demonstrate that the Reliable GeoGrid architecture is very scalable under changing service workloads with moving hotspots and profoundly reliable in the enormous failure of nodes. The third research push in this thesis exploration is centered on mulling over the procedure of moving applications from local physical servers to Cloud. We outline relocation trials and study the bugs and further form the error model. In view of the investigation and perceptions in migration tests, we propose the Cloud-Mig framework which gives both configuration approval and establishment computerization which viably lessens the configuration bugs and initial complications.

In this research, we will provide an in-depth discussion of the principles of migration and its applications in improving data storage performance, balancing service workloads and adapting to the Cloud platform.

Rakhi Chawla, Ph.D Reg No.1250104657 16

2.3.1 Data Migration in Multi-tiered Storage Systems

The significant IO improvements of Solid State Disks (SSD) over traditional rotational hard disks makes it an attractive approach to integrate SSDs in tiered storage systems for performance enhancement [18]. However, to integrate SSD into multi-tiered storage system effectively, automated data migration between SSD and HDD plays a critical role. In many real world application scenarios like banking and supermarket environments, workload and IO profile present interesting characteristics and also bear the constraint of workload deadline. How to fully release the power of data migration while guaranteeing the migration deadline is critical to maximizing the performance of SSD enabled multi-tiered storage system.

In order to fully capitalize on the benefits of SSDs in a multi-tiered storage system with SSDs working as the fastest tier, it is important to identify the right subset of data that needs to be placed on this tier given the limited capacity of SSD tier due to high cost per gigabyte. Specifically, we want to maximize overall system performance by placing critical, IOPS (input/output operations per second) intensive and latency-sensitive data on the fast SSD tier through two-way automated data migration between SSDs and HDDs. By working with a variety of enterprise class storage applications, we observe that many block-level IO workloads exhibit certain time-dependent regularity in terms of access patterns and temperature of extents (hot or cold). For example, in banking applications, IO workloads for account access and credit

Rakhi Chawla, Ph.D Reg No.1250104657 17

verification are typically heavier during certain hours of a day. However, such patterns may change from day-time to night-time, from day to day, from weekdays to weekends or from working days to public holidays. Thus, block-level IO profiling is the first step for building an automated data migration system. The next big challenge is to devise strategies

In this work, we proposed an automated lookahead data migration scheme, calledLAM, which aims to adaptively migrate data between different tiers to keep pace with the IO workload variations, to maximize the benefits of the fast but capacity-limited SSD tier, and to optimize the overall system performance in terms of response time and resource utilization, while limiting the impact of LAM on existing IO workloads.

More concretely, based on workload variations and temperature of block level IO access (e.g., hot or cold extents) learned through IO profiling, we predict shifts in hot-spots of block-level extents and proactively migrate those data extents whose temperature is expected to rise in the next workload into the fast SSD tier during a lookahead period. A key challenge in the LAM design is to understand and trade off multiple factors that influence the optimal lookahead migration window.

The main contributions of this work are twofold. First, we propose the need and the impact of automated deadline aware data migration through observation and analysis of IO workload scenarios from real world storage

Rakhi Chawla, Ph.D Reg No.1250104657 18

system practice. By introducing the basic data migration model in an SSD enabled multi-tiered storage system, we study the characteristics and impacts of several factors, including IO profiles, IO block level bandwidth, and the capacity of SSD tier, on improving overall performance of the tiered storage systems. Second, we propose a lookahead migration framework as an effective solution for performing deadline aware, automated data migration, by carefully managing the performance impact of data migration on existing runtime application workloads and maximizing the gains of lookahead migration. A greedy algorithm is designed to illustrate the importance of determining a near optimal lookahead window length on the overall system performance and a number of important factors, such as block level IO bandwidth, the size of SSD tier, the workload characteristics, and IO profiles. Our experiments are conducted using both the IO trace collected from benchmarks on a commercial enterprise storage system and the simulation over the real trace. The experimental study demonstrates that the greedy algorithm based lookahead migration scheme not only enhances the overall storage system performance but also provides significantly better IO performance as compared to both basic data migration.

The efficiency of greedy algorithm based lookahead data migration is restricted by the incremental granularity and lacks flexibility. Thus an adaptive migration algorithm, which can pace with the changes of the environment of the system, is demanded. In this work, we proposed an adaptive deadline

Rakhi Chawla, Ph.D Reg No.1250104657 19

aware lookahead data migration scheme, called ADLAM, which adaptively decides the window length of lookahead based on the system parameters.

The main contributions of the data migration work are twofold. First we build a formal model to analyze the benefits of basic data migration across different phases on system response time improvements and integrate the benefits in each phase into the benefits across all the phases. Second, we present our data migration optimization process which evolves from learning phase reduction, to constant lookahead data migration and to adaptive lookahead data migration scheme. The system utility measure is proposed to compare the performance gains in each data migration model. We propose an adaptive lookahead migration approach, which works as an effective solution for performing deadline aware data migration by carefully trading off the performance gains achieved by lookahead migration on the next workload and the potential impacts on existing workloads. This approach centers around a formal model which computes the optimal lookahead length by considering a number of important factors, such as block level IO bandwidth, the size of SSD tier, the workload characteristics, and IO profiles. Our experiments confirm the effectiveness of the proposed adaptive data migration scheme by testing the IO traces collected from benchmark and commercial applications running on an enterprise multi-tiered storage server. The experiments show that ADLAM not only improves the overall storage performance, but also out performs the basic data migration model and

Rakhi Chawla, Ph.D Reg No.1250104657 20

constant lookahead migration strategies significantly  in terms  of system

response time improvements.

2.3.2 Service Workload Migration for Reliability and LoadBalance

Although distributed system increases the scalability though distributing the workload among different participating nodes, however the heterogeneity among nodes presents a significant challenge towards system load balance and reliability. Workload migration permitting replicas, or replication, is an effective approach to achieve load balance and reliability. We show the power of replication in an overlay network based distributed location service network, GeoGrid Network. In contrast to centralized client-server architecture, decentralized management and provision of distributed location based services have gained lot of attention due to its low cost in ownership management and its inherent scalability and self configurability.

Measurements [110, 68] performed on deployed overlay networks show that node characteristics such as availability, capacity and connectivity, present highly skewed distribution and such inherent dynamics creates significant variations, even failures, on the services provided by the overlay systems. For example, a sudden node failure that causes the service interruption may lead the system to exhibit dramatic changes in service latency or return inconsistent results. Furthermore, increasing population size of mobile users and diversity of location-based services available to mobile users have

Rakhi Chawla, Ph.D Reg No.1250104657 21

displayed rapidly changing user interests and behavior patterns as they move on the road, which creates moving hot spots of service requests and dynamically changing workloads. Thus an important technical challenge for scaling location service network is to develop a middleware architecture that is both scalable and reliable, on top of a regulated overlay network with node dynamics and node heterogeneity, for large scale location based information delivery and dissemination. By scalable, we mean that the location service network should provide effective load balancing scheme to handle the growing number of mobile users and the unexpected growth and movements of hot spots in service demand. By reliable, we mean that the location service network should be resilient in the presence of sudden node failures and network partition failures.

In this work we proposed Reliable GeoGrid, a decentralized and geographical location aware overlay network service architecture for scalable and reliable delivery of location based services. The main contributions of this work are twofold. First, we describe a distributed replication scheme which enables the reliable location service request processing in an environment of heterogeneous nodes with continuously changing workloads. Our replication framework provides failure resilience to both individual node failures and massive node failures, aiming at keeping the service consistently accessible to users and eliminating the sudden interruption of the ongoing tasks. Second, we present a dynamic replica-based load balancing technique, which utilizes a parameterized utility function to control and scale the system in the

Rakhi Chawla, Ph.D Reg No.1250104657 22

presence of varying workload changes by taking into account of several workload relevant factors. Our experimental evaluation demonstrates that Reliable GeoGrid architecture is highly scalable under changing workloads and moving hotspots, and highly reliable in the presence of both individual node failures and massive node failures.

2.3.3 Application Migration

The last piece of my research work is dedicated to application migration from local data center to the Cloud. Cloud infrastructures such as EC2, App Engine provide a flexible, economic and convenient solution for enterprise applications through the pay-as-you-go business model. More and more enterprises are planning to migrate their services to Cloud platform. However, migrating service to Cloud turns out to be a complicated, expensive, error prone process because of many reasons such as hardware errors, software errors, configuration errors, access permission errors, performance anomalies and etc. Moreover, enterprise applications consist of massively distributed networked components, and sophisticated dependency and logics exist among system components, which further aggravates the complexity and hardness of the migration problem. However, there exists neither a framework nor a tool to simplify the migration process and validate if the system is working correctly after the migration. Most of the literature is focusing on improving the efficiency of migrating single virtual machine images. We will dedicate the 5th Chapter of this research to the migration configuration

Rakhi Chawla, Ph.D Reg No.1250104657 23

validation and system installation automation. In this work, we design an effective policy based migration configration validation framework which simplifies the application migration process and reduces the probability of errors. We argue that such framework development can significantly facilitate the migration process and validate the effectiveness of application migration.

2.3.4 Contribution of the Research

The contribution of this research is that we study important research problems in data and application migration and further propose the development of the technical solutions for the challenges. We started by scrutinizing the prospect of data migration automatically across multi – tier storage frameworks. The considerable IO progress in Solid State Disks (SSD) over conventional rotational hard disks (HDD) encouraged the integration of SSD into existing storage hierarchy for better performance. We created versatile look-ahead data migration approach to proficiently incorporate SSD into the multi-layered storage architecture. The major functionality, while storing the low temperature data (cold data) in HDD tier and placing the high temperature data (hot data) into fast and expensive SSD tier, is to deal with the migration of data as their access patterns are shifted from cold to hot and vice versa. For instance, workloads while day time in retail or banking applications can be drastically dissimilar as in evening hours. We outlined and actualized a adaptive lookahead data migration model. An exceptional peculiarity of our mechanized migartion methodology is its capacity to rapidly adjust the data

Rakhi Chawla, Ph.D Reg No.1250104657 24

migration program to accomplish the optimal migration adequacy by considering application particular attributes and I/O profiles and additionally workload cut-off dates. Our research over the actual framework demonstrates that the essential look-ahead data migartion model is successful in enhancing and improving the productivity and scalability of multi-tier storage systems. The second principle commitment we have made in this research work is to address confront of reliability and load balancing among computing nodes of network, handled in decentralized service of computer system.

Considering, providing services based on location to mobile users who are geographically distributed, the nonstop and gigantic service request workloads posture considerable technical confront for the framework to ensure scalable and reliable service condition. We planed and build up a decentralized architecture for computing, called Reliable GeoGrid, with two remarkable gimmicks. First and foremost, we build up a scheme with migration of workload and controlled replication, which uses an easy route based optimization to expand the strength of the structure against various failures of nodes and partitioning of network failure as well. Second, we devise a vibrant technique for load balancing to scale the framework in expectation of sudden workload changes. Our tests demonstrate that the Reliable GeoGrid architecture is very scalable under changing service workloads with moving hotspots and profoundly reliable in the enormous failure of nodes. The third research push in this thesis exploration is centered on mulling over the procedure of moving applications from local physical servers to Cloud. We outline relocation trials and study the bugs and further

Rakhi Chawla, Ph.D Reg No.1250104657 25

form the error model. In view of the investigation and perceptions in migration tests, we propose the Cloud- Mig framework which gives both configuration approval and establishment computerization which viably lessens the configuration bugs and initial complications.

2.3.5 Organization of the Research

The major part of the research is organized into 3 parts. First, we will introduce data migration in multi-tiered storage system. We introduce the motivation behind the research problem and propose our lookahead data migration. Also we build the theoretical model which can compute the lookahead window length optimally. Next we introduce the service migration in decentralized systems, that is, we discuss the reliability and load balance problems in large scale distributed systems and then discuss our proposed hybrid replication scheme which can handle both the reliability and load balance problem. Finally, we introduce our latest work on application migration. We discuss the motivation experiments we have done and the keys to solve this problem and our policy based solution in solving this problem. Finally, we conclude this research.

Rakhi Chawla, Ph.D Reg No.1250104657 26

CHAPTER 3. DATA MIGRATION IN MULTI-TIERED

STORAGEBASED CLOUD ENVIRONMENT

3.1 INTRODUCTION

The rise of solid-state drives (SSDs) in enterprise data storage arrays in recent years has put higher demand on automated management of multi-tiered data storage software to better take advantage of expensive SSD capacity. It is widely recognized that SSDs are a natural fit for enterprise storage environments, as their performance benefit sare best leveraged for enterprise applications that require high inputs/outputs per second (IOPS), such as transaction processing, batch processing, query or decision support analysis [93, 98].

Recently, a number of storage systems supporting Flash devices (SSDs and memory)have appeared in the marketplace such as NetApp FAS3100 system [7], IBM DS8000[5] and EMC Symmetrix [3]. Most of these products fall into two categories in terms of the ways of integrating Flash devices into the storage system. The first category involves the products that have taken the approach of utilizing Flash-memory based caches to accelerate storage system performance [7]. The main reason that is in favour of using Flash devices as cache includes the simplicity of integration into existing systems without having to explicitly consider data placement and the performance improvement by increasing cache size at lower cost (compared to DRAM)[10].

Rakhi Chawla, Ph.D Reg No.1250104657 27

The second category includes those vendors that have chosen to integrate SSDs in totiered storage architectures as fast persistent storage [5]. The arguments for using and integrating flash devices in an SSD form factor as persistent storage in a tiered system include issues such as data integrity, accelerated wear-out and asymmetric write performance.

Given the limited capacity of SSD tier in the multi-tiered storage systems, it is critical to place the most IOPS-intensive and latency sensitive data on the fastest SSD tier, in order to maximize the benefits achieved by utilizing SSD in the architecture. However, a key challenge in addressing two-way data migration between SSDs and HDDs arises from the observation that hot-spots in the stored data continue to move over time. In particular, previously cold (i.e., infrequently accessed) data may suddenly or periodically become hot (i.e., frequently accessed, performance critical) and vice versa. Another important challenge for automated data migration is the capability to control and confine migration overheads to be within the acceptable performance tolerance of high priority routine transactions and data operations.

By analyzing the real applications in environments such as banking settings and retail stores with a commercial enterprise multi-tiered storage server, one can discover certain temporal and spatial regularity in terms of access patterns and temperature of extents. For example, in a routine banking environment, the major workload during the daytime centers around transaction processing and thus certain indices become hot in the daytime

Rakhi Chawla, Ph.D Reg No.1250104657 28

period, while at the night time, the workload switches into report generation type and correspondingly different indices become hot. Generally speaking, such patterns may not only change from day-time to night-time and it is also likely that the pattern changes from day to day, from weekdays to weekends or from working period to vacation period. This motivates us to utilize IO profile as a powerful tool to guide the data migration. Further, in order to improve system resource utilization, the data migration techniques devised for multi-tiered storage architecture need to be both effective and non-intrusive. By effective, we mean that the migration activity must select an appropriate subset of data for migration, which can maximize overall system performance while guaranteeing completion of the migration before the onset of the new workload (i.e., within a migration deadline). By non-intrusive, we mean that the migration activity must minimize its impact on concurrently executing storage workloads running over the multi-tier storage system.

In this chapter, we present an adaptive deadline aware lookahead data migration scheme, called ADLAM, which aims to adaptively migrate data between different storage tiers in order to fully release the power of the fast and capacity-limited SSD tier to serve hot data and thus improves system performance and resource utilization and meanwhile limiting the impact of ADLAM on existing active workload. Concretely, we want to capitalize on the expensive SSD tier primarily for hot data extents based on IO profiling and lookahead migration. IO profiling enables the shifts in hot spots at the extent level to be predictable and exploitable, and based on this lookahead migration

Rakhi Chawla, Ph.D Reg No.1250104657 29

proactively migrates those extents, whose heat is expected to rise in the next workload, into SSD tier during a lookahead period. To optimally set the lookahead migration window, multiple dimensions that impact the optima llookahead length needs to be exploited. By optimal we mean that this lookahead migration should effectively prepare the storage system for the next workload by reducing the time taken to achieve the peak performance of the next workload and maximizing the utilization of SSDs. We deployed a prototype ADLAM in an operational enterprise storage system and the results show that lookahead data migration effectively improves system response with reduced average cost by fully utilizing the scarce resource such as SSDs.

The main contributions of this chapter are twofold. First, we describe the needs and impacts of the basic deadline aware data migration on improving system performance from real storage practice. We build a formal model to analyze the benefits of basic data migration across different phase on system response time improvements and integrate the benefits in each phases into the benefits across all the phases. Second, we present our data migration optimization process which evolves from learning phase reduction, to constant lookahead data migration and to adaptive lookahead data migration scheme. The system utility measure is built to compare the performance gains in each data migration model. Adaptive lookahead migration approach, which works as an effective solution for performing deadline aware data migration by carefully trading off the performance gains achieved by lookahead migration on the next workload and the potential impacts on existing workloads. This

Rakhi Chawla, Ph.D Reg No.1250104657 30

approach centers around a formal model which computes the optimal lookahead length by considering a number of important factors, such as block level IO bandwidth, the size of SSD tier, the workload characteristics, and IO profiles. Our experiments confirms the effectiveness of the proposed adaptive data migration scheme by testing the IO traces collected from benchmark and commercial applications running on an enterprise multi-tiered storage server. The experiments shows that ADLAM not only improves the overall storage performance, but also outperforms the basic data migration model and constant lookahead migration strategies significantly in terms of system response time improvements.

...(download the rest of the essay above)

About this essay:

This essay was submitted to us by a student in order to help you with your studies.

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, . Available from:< https://www.essaysauce.com/essays/engineering/2017-2-20-1487580381.php > [Accessed 22.10.19].