Home > Sample essays > Big Data’s Potential: A Study of Database Solutions for Intel’s CAPS-Cellular Analysis and Post-Processing Suite Framework

Essay: Big Data’s Potential: A Study of Database Solutions for Intel’s CAPS-Cellular Analysis and Post-Processing Suite Framework

Essay details and download:

  • Subject area(s): Sample essays
  • Reading time: 19 minutes
  • Price: Free download
  • Published: 1 April 2019*
  • File format: Text
  • Words: 5,519 (approx)
  • Number of pages: 23 (approx)

Text preview of this essay:

This page of the essay has 5,519 words. Download the full version above.



This thesis is an extension to the work done by me at Intel as an intern. During my internship, my contributions included the development of a data analysis and visualization framework “CAPS-Cellular analysis and Post-Processing Suite” for trace analysis.

The framework was based on the standard information visualization pipeline as shown in the Figure 1.1 [1].

Figure 1.1: Information Visualization Pipeline

The labeled rectangular boxes are the stages and the horizontal directed arrows are the processing steps. The user can interact and analyze the data in all the stages. Source data consist of raw set of observations which has to be visualized. To visualize this data, we need to first populate the details into the data structure required by the model, this is the Transformed data. The transformed data should be mapped to the visual structure which means a decision has to be taken about the model and its details. And the final step is to render the visual image on the display.

To understand the information visualization process, let us consider an example shown in the Figure 1.2. The source data is the table with the details of men and women. User is interested in knowing the height to weight ratio between men and women, which doesn’t need the ‘names’ field. So the raw data is filtered only for the information necessary to visualize certain behavior, which results in ‘height vs weight’ table. This is the transformed data which has only the necessary information. The process of filtering raw data to transformed data is not trivial. This can be a very complex and time consuming process based on the structure, size, and type of the data. This can eventually lead to the problems of Big Data management.  A visual structure is defined based on the transformed data. This visual structure describes the attributes on how the data has to be visualized. In this example, the data has to be visualized in a scatter plot and specifies the description of x-axis and y-axis. Finally this visual structure is rendered generating the required visualization for analysis as shown in the Figure 1.2.

Figure 1.2: Example to describe information visualization pipeline

The below section describes the architecture of CAPS and relate the architecture to the info-vis pipeline. Figure 1.3 shows the architecture of CAPS [2]. It is a standalone windows application, which is run via command line interface. The architecture pattern used to design CAPS is “Pipes and Filters”.  In CAPS, the trace file, user model and plot code are the inputs to the sub-component STTAutomation. It processes the data by decoding the trace files and filtering the necessary data from the trace files based on the user model. The filtered data is processed and visualized using the plot resource. To understand plot resource, assume the analyst is interested in seeing a histogram of a specific parameter at a particular time. This requirement is described in the plot resource using the framework APIs. By changing the plot code, the analyst can retrieve information in any form. Plot resource specifies the required artifacts to be generated by CAPS. This plot resource is evaluated by the sub-component MPT using the data obtained by STTAutomation, thus generating output such as figures, tables and CSVs.  In CAPS, trace file is the raw source data, user model is the filter responsible for data transformation. Plot resource defines the visual model, on processing generates different visual artifacts used for analysis [2].

Figure 1.3: High level architecture of CAPS

As mentioned before, trace files are binary files which are random and semi-structured. In CAPS framework, the filtering is done based on sequential file read.  The process of filtering such a trace file based on multiple rules is a complex and time consuming task. The major objective of this thesis is to analyze and propose ways to improve the performance of processing trace files.

Relation of the current work with prior work

As a part of internship, a data analysis and visualization framework, ‘CAPS’ was developed which laid the ground work for this thesis. The current work is research oriented and focus only on filtering part of CAPS. The filtering of trace files was done by sequential file reads which had a lot of scope for optimization and performance improvement. This thesis aims at identifying an efficient database solution to be incorporated in CAPS framework for filtering the trace files.

1.4 Structure of the thesis

The chapter 2 begins by understanding the basics of database, compression techniques, memory management, and input/output operations. As the outcome of this work is used for commercial purposes, the software licenses play an important role. The complete set of requirements which is a driving factor for any project is described in chapter 3. To achieve any result there is no one defined way, it is always essential to try different techniques or the combination of them. Chapter 4 lays the ground work for the various ideas and approaches to be followed in this work. The ideas are brought to life in chapter 5 by choosing few custom made solutions and a few COTS (Commercial off the shelf) solutions. There is more than one viable solution that have to be carefully evaluated to ensure the expectations are met.  Chapter 6 does the micro-analysis of the shortlisted candidates.  Finally the conclusion of the work and the possible topics for discussion or for enhancements is presented in chapter 7 and chapter 8.

Databases are nothing but collection of related data in an organized way. The database technology has evolved with time and it is necessary to understand the basic types of databases.

2.1.1 Flat file database

A flat file database is an ordinary file on a computer. Each line in the file contains one record. Delimiters can be used to separate the fields in the record. For any database operations such as read, write, delete or update, the data must be read in to the computer memory to perform the required change and write it back to the file. Records can be duplicated and the costs of maintaining a flat file database is very high [3].

Table 2.1 shows the example of a flat file database where each row is an independent entry. It can be seen that row 1 and row 3 has very similar data which is redundant and it can lead to inconsistencies. These drawbacks led to the introduction of hierarchical databases.

Student Name Book Borrowed Author ISBN

Alice Designing databases Dr. James Muller 12345

Bob SQL made easy Prof. Thomas Schulz 67890

Mary Designing databases Dr. James Muller 12345

Table 2.1: Model of a flat file database

2.1.2 Hierarchical database

The records in hierarchical database is organized in a parent child relationship (a.k.a tree structure) with each child having at most one parent. There is no data duplication in this structure and data search is efficient. But any new update needs a complete update of the structure which makes the implementation very tedious [4]. Figure 2.1 shows the hierarchical model of the database where each student can have set of books. Every book is related to a single parent node, which is the student entry. This overcomes few of the disadvantages of the flat file databases. This has still some redundancy as the information of book can be repeated across different students. In the example, Book2 entry is repeated for both Alice and Bob which is redundant.

Figure 2.1: Model of hierarchical database

2.1.3 Network database

Unlike hierarchical database, the records in the network database has many to many implementation. Network database could handle more relationships among its records as it allows many to many relationships [5]. Other specifics are similar to hierarchical database with regards to data search and implementation. Figure 2.2 shows the example of a library inventory which is described in the network model. Each book has a list of students who has borrowed it from the library (assuming there are multiple copies of same book). Bob has borrowed Book1 and Book2. This model depicts many to many relationship.

Figure 2.2: Model of a network database

2.1.4 Relational database

To overcome the disadvantages of other previously known databases, relational database was developed. The records in relational database is stores as tables with rows and columns. This follows the relational model. Most of the relational databases use a structured query language to perform the database operation which are maintained by various software called relational database management systems. Contrast to the network model where every relation is a parent child relationship, relations model can have independent tables which have relationship among the entries [6]. Consider an example, Table 2.2 is the student information and Table 2.3 is the book details. There is no redundant information in these tables and Table 2.4 defines the relationship between student and book using the unique key in each table.

Student ID Student Name

1 Alice

2 Bob

3 Mary

Table 2.2: Student record

Book ID Book name Author ISBN

1 Designing databases Dr. James Muller 12345

2 SQL made easy Prof. Thomas Schulz 67890

Table 2.3: Inventory of books

Student ID Book ID

1 1

2 1

3 2

Table 2.4: Relation between student and book

2.1.5 Object oriented database

Unlike relational databases where the records are in tabular form, the records in the object oriented database are in the form of objects. They offer some kind of query language for data retrieval and modification. Data access is faster as compared to relational databases as the table joins are not necessary and retrieval is not based on search but it is based on pointers [7].   

2.1.6 NoSQL Databases

In spite of relational databases being very powerful, it is not very flexible. Full advantage of the database can be taken only if the data is defined in proper schemas and relationships. But in last few years there has been a rise of new databases known as “NoSQL” meaning Not Only SQL. The idea of NoSQL databases is not to use any relational model. There is more than one kind of NoSQL database which can used based on different needs. NoSQL databases can be categorized into mainly four types [8]:

Key-Value store: This is the simplest NoSQL database. As the name suggests, every record is a key value pair. A record can be added, retrieved or deleted based on a key.

Column family store: The column stores are essentially two dimensional arrays. Each record is similar to a key value pair but each value can be one or more key value pairs which means every entry pointed by a key is a set of key value pairs. Each of the keys point to a set of columns which are usually accessed together. Hence the name column family stores [9].

Document store: This is inspired by the key value stores. The values are self-describing documents like json or xml instead of a simple entry. The documents can be nested within itself to any level. Documents of different entries are similar but do not have to be exactly same, which offers a very good flexibility.

Graph database: The data is stored as graphs, with nodes and edges. The entities can have many to many relationships thus suitable for heavily linked data.  

2.1.7 Comparison of NoSQL database with SQL database

Although NoSQL databases does not have a common query language like SQL databases. Each of the NoSQL databases provide its own query methods. NoSQL database is chosen over SQL database when the requirements are scalability, flexibility, schema free design and performance over large datasets [9].

2.1.8 Embedded and External databases

The very initial task of deciding on a database is to choose between external and embedded databases. There should be a clear understanding on the advantages and usage of both the types.

• External databases are very well scalable and best suited when many users are accessing the same database. If we do not have a use case where many users have to access the same database, embedded database can be used.

• Storing of embedded database is usually in a flat files or a folder which makes it very convenient to store and transfer, whereas the external databases use more sophisticated methods to store which makes is comparatively difficult to port.

• If the architecture under consideration has to be three tier architecture, which means running different components like database server, application server, and application client on different nodes then external databases should be chosen. External databases can be run on different nodes or a single node as a standalone application. In case of embedded database, the database is integrated with the application server which makes it to be on the same system like a standalone application [10].

2.2 Data Compression

Data compression is the branch of information theory which aims at reducing the number of bits to store the same original information. The reduced bits which is obviously smaller than the original information is used for storing and transmitting data in a faster or efficient way. Compression can be either lossless or lossy. Data compressed using lossless compression can be decompressed to exact original value. Lossy compression removes certain unimportant data. Lossy compression is well used in audio and video compression where removal of finer details will not the affect the quality noticeable by the human eye or ear. Lossy compression need to distinguish between the important and unimportant data. The distinguishing can be an artificial intelligence problem as it requires clear understanding of the data and the way it is perceived. However to keep it simple and also as the study is dealing with trace information, the focus shall be limited to lossless compression in this study [11].

There is no such thing as a "universal" compression algorithm [12].  Compression algorithms places a hard limits on what can and cannot be compressed and also to what extent. No compression guarantees to compress any input, or even input above a certain size. Random data compression and recursive compression are not possible in any compression algorithm, meaning with no prior information about the data, compression will not be effective and compressing the compressed data recursively will not work after certain limits. There is no algorithm that tests for randomness or tells you whether a string can be compressed any further. Only way to figure it out is by actually trying to compression with different compression techniques [12].

2.2.1 Run length coding (RLC) Coding

This is one of the simple compression algorithms. The concept is to replace the repeated consecutive characters with a single character and a count. This reduces the number of bytes used in representing the same character multiple times. Consider a example of the string “abbbccdddddd”. This string can be replaced by “a1b3c3d6”.  Other coding schemes can be applied like Huffman coding (see section Huffman coding) on the compressed string to further compress. This example showed a saving of 4 bytes (assuming 1 character is 1 byte). RLC might not necessarily compress the data always, it can even increase the data if is not applied on a suitable data. For example consider the string “abcdef”, after RLC the string is “a1b1c1d1e1f1” which takes twice the size as before. RLC is well suited for data which are repeated consecutively. One of the real world use cases of RLC is to compress the black and white images. Each pixel is either black or white and the probability of repeated pixels is very high [13].

2.2.2 Huffman coding

Huffman is the most common lossless compression in use. It uses the optimal prefix code computed based on the probability distribution. Unlike ASCII codes where each character is 1 byte (8 bits), Huffman coding used variable length encoding, meaning each character can be represented using different sized bits/bytes and this is decided based on the probability of occurrence of the specific character. A character which occurs very often should be represented using less number of bits which reduces the number of bits. Consider an example, Table 2.5 shows the set of alphabets and the frequency of occurrence for each alphabet in a sentence. Let us build a Huffman tree for this dataset [14].

Step 1: Sort this list by frequency and make the two-lowest elements into leaves, create a parent node with a frequency that is the sum of the two lowest elements frequency.

Step 2: Remove the two elements from the list and insert the new parent node with frequency equal to sum of frequencies of the leaf nodes.

Alphabet Frequency

a 1

b 2

c 4

d 5

e 8

Table 2.5: Data frequency table

Step 3: You then repeat the loop, combining the two lowest elements, until one element is in the list which is the root node of the Huffman tree.

Step 4: Encoding the alphabet using Huffman tree is simple. Traverse the tree to the desired value with an output ‘0’ every time the traversal is to the left hand branch, and a ‘1’ every time the traversal is to the right hand branch.

The resulting Huffman tree for the example is as shown in Figure 2.3.

Figure 2.3: Huffman tree

Table 2.6 shows the encoded values for each alphabet.

Alphabet Encoded Values

a 1100

b 1101

c 111

d 10

e 0

Table 2.6: Encoded values by Huffman tree

Let us compare the number of bits needed to encode the string “abcde” using ASCII and Huffman coding schemes.

ASCII Code: Encoding scheme: 8bit/character, Length of (“abcde”) *8 = 40 bits to encode.

Huffman Code:  Sum of bits necessary to encode each alphabet, i.e., 4 + 4 + 3 + 2 + 1 = 14 bits to encode “abcde”. The saving is 34 bits when Huffman coding is used instead of ASCII codes in this particular example. This is the concept of variable length coding where the length of the code assigned is inversely propositional to the frequency of occurrence of the character.  This adds an additional overhead to compute the Huffman tree for each different scenario and also to obtain the knowledge of the data beforehand.  

2.2.3 Delta coding

Delta coding only transmits or stores differences between sequential data. In more common terms it is called data differencing. Storing or transmitting deltas make is space efficient. This is well suited for sequential data streams like time sequencing data. Consider the scenario of message streams which has an epoch time in its header (example: 1450870343) and difference between messages is ~1 millisecond. In such cases instead of transmitting such a large number every 1 millisecond it is possible to send the difference to the previous timestamp, so that the actual time can be re-created [15].

2.2.4 Lempel Ziv 77

LZ77 is a lossless compression which form the basis of many variants like LZMA, LZW and many more. LZ77 works based on dictionary and maintains a sliding window during compression. It achieves compression by referencing to the earlier occurrence of the data which replaces the repeated occurrences of the same data. Each repeated occurrence will be replaced by the length-offset pair. Length indicates the number of characters in the sequence to be replaced, by the same length of characters in the sequence behind, by distance specified by the offset. It is not possible to specify the length more than its offset. To have this information about previous data, the algorithms stores data in the structure called sliding window. Hence it is called sliding window compression. In real world this algorithms used for PNG, GIF, 7zip and even snappy compression (see section Snappy compression) uses a variant of LZ77 [16].

2.2.5 Snappy compression

This is an open source compression algorithm by Google which uses the ideas of LZ77 compression. The idea of Snappy is to provide a faster compression algorithm. Snappy compression’s ratio will be much less as compared to other available compression algorithms but it aims for high speed of compression. Snappy in an optimized environment compresses at a speed of 250MB/sec and decompresses at a speed of almost 500 MB/sec. Level DB optionally uses snappy compression to store the data there by reducing the database size [17].

2.2.6 Zlib compression

Most of the compression algorithms are file oriented. It assumes the file has the complete data which has to be compressed. But some data compression algorithms are designed for streaming data. Such algorithms break up the data into “blocks”. Encoders add synchronization markers in the beginning and/or end of the blocks which are used by decoders to determine the blocks. Decoder re-initializes all its properties for each block. The size of the block is decided considering the trade-off. Larger block size results in a better compression but the smaller block size results in quick restarts. Based on the requirement and priority, the block size can be defined [18].

Zlib is a software library which supports stream compression. Zlib provides in-memory compression and decompression. The zlib format was designed to be compact and fast for use in memory and on communications channels. There is no limit on the data length which can be compressed or decompressed. Multiple calls to api’s will facilitate an unlimited blocks of data to be compressed and decompressed [19].  

2.3 Software Licenses

A proper understanding of software licenses is very essential. The solution proposed in this thesis will be implemented in Intel’s proprietary product and hence the license should facilitate the use of software for commercial purpose and also retain the confidentially of the modified source code.

2.3.1 Free Software

Software under free software license is permitted to use, copy, redistribute with or without modification for free of charge or for a fee [20].

2.3.2 Open Source Software

Open source license is very similar to free software license with some restrictions. The difference between the two variants are very small [21].

2.3.3 Public Domain software

Public domain software is “non-copyrighted”. This means the author has not right on the software to control the use, copy and redistribution of the same. Public domain software is different from open source, as all open source software are copyrighted but the author given the right to everyone to use and modify it [20].  

2.3.4 Copylefted Software

Copylefted software is a free software which ensure the source code modifications retains the same rights as before. The software’s under copylefted license cannot be used for commercial purpose as it mandates to retain the same license as before which would make the resulting product free to use by anyone. GNU General public license (GPL) is one of the examples [20].

2.3.5 Permissive Free Software/ Non-copylefted software

Permissive free software in contrast to copylefted software, allows the user to add additional restrictions to the modified software. This is ideal for using a software for commercial use as it allows to retain the confidentiality of the modified software. Examples include MIT and BSD licenses [20].

2.4 Input Output (I/O) operations

Input Output (I/O) is the process of data exchange to/from the computer. Any system should have efficient means to receive input and deliver output. Many I/O devices can be connected to the processor and main memory [22]. The external IO devices include hard disks, mouse, keyboard, printer and so on. The interface to an IO device is in the form of status, control and data signals. Control signal specifies the operations like READ/WRITE. Data signal is the set, bits are read or written from/to the device. Status specifies the state of the device like READY, WAIT and so on [23]. The main functionality of an IO module is the following:

Timing: Control the flow of traffic between internal resources and external resources

CPU and Device Communication: Communication with the CPU for address recognition, data, signals and external devices.

Data buffering: This is one of the important functionality of an IO module which is of interest to this thesis. Buffering, allows to temporarily hold the data between IO module and external devices. Let us understand the concept of IO buffering considering disk IO in the section I/O Buffering.   

2.4.1 I/O Buffering

File reads and writes from disk makes the processor wait as the data has to be actually read from an external device. This is the access time, the time required to process a data request from the processor and get the necessary data from the storage device. As HDD’s (hard disk drives) are mechanical devices the access time is more compared to SSD (Solid state drives) [24].

Input/output (I/O) buffering is a mechanism that improves the throughput of input and output operations. It is implemented directly in block devices, devices which allows data exchange in blocks unlike character devices which does data exchange character by character (byte by byte) [25]. The implementation is also available in the corresponding drivers and among almost all standard programming language libraries.

Access time of an IO operation is very high in the order of millions of processor clock cycles. Most of this latency is due to the hardware itself. For example, information cannot be read from or written to a hard disk until the spinning of the disk brings the target sectors directly under the read/write head. (At the time of this writing, 7200 RPM hard drives are the norm, so this process may take up to about 8 milliseconds to complete.) When the input/output device is a network interface, the latency is usually greater [24].

To reduce the IO latency the IO devices implement buffers. Single byte read operations reads one block of data from a device into main memory. Some drivers might implement caching. Caching is to fetch the requested data block and several of other immediately following blocks on the disk. Caching help to increase the performance as programs often access the disk sequentially, which is data access for the next physical block on the disk. When the program actually requests for the next block, the data is delivered without any disk access as the data is already available in the main memory. This increases the performance to a greater extent. Now let us consider the scenario of disk writes. When writes to disk are requested, the data is stored in the buffer (main memory) until it has accumulated enough blocks of data. Once the defined amount of data is reached it is written to external storage at once. This is called flushing the output buffer.

On a multitasking operating system, hardware devices are controlled by the kernel, and user space applications may not directly access them. System calls are used in such a case which for many reasons add an overhead. This overhead is typically in the order of microseconds rather than milliseconds, so using buffering here is not crucial for programs that perform a relatively small amount of I/O, but makes a big difference for applications that are I/O intense.

Thus, nearly every program written in a high-level programming language will have its own I/O buffers (typically one input buffer for each file or device that the program wants to read from, and one output buffer for each it wants to write to). These buffers maintained by the programming language may be much larger than the ones maintained by the low-level drivers, and they exist at a higher level of abstraction, as they are associated with file handle rather than actual hardware [24]. When a file read is performed, the data is the buffer is checked for its validity and only if the data is invalid (not relevant to the request) actual system call to the hardware is made. Likewise for write operations, data is written to the buffer until it is full and all the contents are sent to the hardware by a system call. Libraries usually provide mechanism to manually flush the buffer if necessary.

All these operations are performed "behind-the-scenes" in the library and driver implementation making the application layer unaware of these performance improvement measure. But understanding and exploiting the power of the low lying functionality, helps to design the solution in an efficient and optimized approach.

2.5 Interpolation search

Interpolation search is the modified version of binary search which can yield faster results with some data limitations. The limitations are, the data should be sorted, uniformly distributed and it is necessary to know the bounds of the interval. This is used to make the search faster. The idea of the interpolation search is similar to finding a word in the dictionary. We are aware of the alphabetical order and if we have to search the word “university”, we start looking for the word towards the end of the dictionary as ‘U’ appears in the end of the alphabet sequence. A similar approach is used by interpolation search, hence it is essential to know the bounds of the data and the data should uniformly distributed to take the full advantage of the algorithm. The average case complexity of the algorithm can be log (log (n)) which is very fast and in case the data is randomly distributed the worst case complexity can be O (n) [26].

This chapter specifies all the requirements necessary to pursue the analysis.

3.1 Functional Requirements

A functional requirement specifies the behavior of the system. A list of requirements is specified in the form of use cases. To understand the use case defined, it is necessary to understand the structure of the data in trace files.

3.1.1 Structure of the trace file

Trace files contain a sequence of messages from the modem. Each event from the modem is a message which is usually self-contained, meaning it has no dependency on the events before or after. Every message has a header/meta-information and the payload. Messages are time sequenced, normally incrementing but it is also possible to have a negative jumps due to hardware buffer constraints of the modem. Messages are BLOB’s (binary large objects) which has to be decoded based on the message type. The average size of the payload is 40 bytes and the maximum payload size can be 64KB.

3.1.2 Use cases

Use case 1: Data storage

Each message in the trace should be referenced by a continuously increasing unique identifier called message identifier.

Timestamp is mostly increasing with a few exception where there can be negative jumps.

In real case scenario, two consecutive messages will not be of same message type. This information is useful for retrieval strategy based on message type.   

Use case 2: Data type of the message

The message are stored as BLOB’s in the database which can be processed to obtain the actual information from the data. This avoids any pre-processing of the data before storage which increases the storage performance.

Use case 3: Data retrieval

Retrieval has three major specifications,

• Retrieval based on Message Index: Message shall be randomly queried based on message index. In such a case, there is a high chance that the next query is based on consecutive messages. Forward access shall be considered for 60-70% of the queries and 30-40% of the queries shall be considered for backward access.  

• Retrieval based on Timestamp: Usually the query is to pick a single message based on timestamp. Probability of requesting next messages with respect to timestamp is quite small.

• Retrieval based on Message type: Database shall support the query of group of messages based on the message type. The forward access shall be necessary for 60-70% of the queries and 30-40% shall be backward access.

The access based on timestamp and message type can be two fold. This means, the first query can be to retrieve the message index based on message type or timestamp and second query to retrieve the message based on message index. Data modification is not considered in the set of use cases.

Figure 3.1 shows the UML use case diagram for the set of requirements.

3.2 Non-Functional Requirements

Performance:

There is no hard specification about performance but it is expected to outperform the existing performance numbers and ideally writing speed should be close to file reading speed and reading shall be close to disk reading speeds. So performance shall be as good as the hardware support, but it is essential to have a well described benchmarking. It shall be assumed that the performance increases linearly with the increase in the hardware support.

Reliability:

The database should be robust against system failures and application crashes. The data should not be corrupted for any reason.

 Maintainability:

The database under consideration should be easily serviceable and maintained. This challenges the development of our own database as such a product once deployed will need enormous amount of support and resources for maintenance.

Figure 3.1: Use case diagram

Portability:

Current implementation of tracing tool is supported on Windows and Mac. There is an ongoing effort to support seamlessly on Linux. So this clearly defines that the proposed solution shall be supported on Windows, Linux and Mac.

License:

Any open source used shall be under permissive free software list as it is necessary to modify the code as per the needs and retain the confidentiality and also the access rights.

Concurrency by multithreading:

It should be possible to query the data stored while the database is still being written.  This means that read and write processes should work in parallel. Right now there is no immediate requirement for concurrency but this might be used for future enhancements.

Concurrency by multiprocessing:

As described in concurrency section, it shall be possible to read and write in parallel which can be considered as two independent tasks to be performed. Each of these tasks can be an independent process. Both of these tasks are equally important and shall be available at all times which brings the need for multiprocessing support.

3.3 System Requirements

Operating system: Windows / Linux/ OS X with 32-bit or 64-bit processor

Programming language: C++. The code base shall be in C++ or else a C++ wrapper should be provided for the API’s.

Physical memory: 2GB of RAM recommended

Storage: SSD is preferred.

...(download the rest of the essay above)

About this essay:

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, Big Data’s Potential: A Study of Database Solutions for Intel’s CAPS-Cellular Analysis and Post-Processing Suite Framework. Available from:<https://www.essaysauce.com/sample-essays/2016-1-23-1453547091/> [Accessed 29-03-24].

These Sample essays have been submitted to us by students in order to help you with your studies.

* This essay may have been previously published on Essay.uk.com at an earlier date.