PROJECT PROPOSAL
A big database mainly refers to a database whose size is beyond the ability of a typical database software tool used to capture, sort ██████████████████████████████████████████████████tise, the accumulated datasets which before had no use will have added value significantly due to the design of huge data tools which will be able to load all the available datasets
Sometimes loading very large datasets for instant exceeding 10GB is often slow essentially if there are many indexes on the tables. In my proje██████████████████████████████████████████████████e technology and procedure that would ease the loading of high traffic datasets in given systems. There are no problems in loading small data sets which are fast fed into the systems unlike in the case of large datasets.
The project will solve most organization need for better data management technique including loading of large files which has been a challen██████████████████████████████████████████████████ the project will help solve data loading challenges among the administration and also among other students. The project will also link me to the practical part of the course and also help me gain experience.
I am expected after the project to come up with a better technique for loading data████████████████████████████████████████etermined by the type of technology that I will find fit for the challenges am trying to solve.
AIM OF THE PROJECT
Organizing business will become more accurate and interesting with the emergence of better data tools which surpass the already available tools used to analyze data. Better and faster running of the organization goal will be achieved after the project has been executed.
PROJECT DESIGN
The most used technique for loading and sorting data before it is fed into the system is to first drop its index, reloading the index and then██████████████████████████████████████████████████used. The main goal of this project is to investigate any other techniques that would improve load p██████████████████████████████████████████████████and execute my project based on the compressed structure of power BI which I consider as a better technique for data loading.
TIME FRAME
Am targeting to do my project within a duration of three weeks from the time I set up my proposal. This project may be challenging and time-consuming ████████████████████████████████████████████████████████████████████████████████roject within a duration of three weeks, I will establish a temporal r████████████████████████████████████████████████████████████████████████████████ment. I might need also to reduce my leisure time and dedicate it to the project.
PROJECT BUDGET
In terms of resources, I am planning to set a budget of about 500$ so that I can cater for all my research expenses and electronic devices I██████████████████████████████████████████████████ill need to consult database management experts to get their technical views about my project. Most of the resources allocated to the project will be spent on running of the final project and making it practical.
RISKS AND ISSUES EXPECTED
I will have to risk the loss of part or all the database and datasets that I will choose to use during execution of my project. Most of the is████████████████████████████████████████████████████████████████████████████████ata loss, however, I may have challenges when accessing the database o████████████████████████████████████████████████████████████████████████████████ted to use database and dataset which are available to me regardless of its size.
PEOPLE RESPONSIBLE FOR THE PROJECT
Mainly I will be responsible for the project execution, I will play most of the roles which will include gathering information, executing and co██████████████████████████████████████████████████hat can be employed in order to ease the rate at which dataset is loaded into the database. I might need to consult an expert to know his thoughts concerning my idea.
PROJECT REPORT
I will give a report of my project after three weeks of project execution. My report will be based on the success of my project. I will also██████████████████████████████████████████████████have opted to investigate on and later discuss any challenges that someone may face when using the technique.
DATABASE MANAGEMENT PROJECT.
INTRODUCTION
In most departments in various organizations, there has been a big challenge when dealing with analyses and control of large datasets. This i██████████████████████████████████████████████████ be used to handle and analyze huge datasets. There was a need to do some research in order to find ██████████████████████████████████████████████████ used to solve the big challenge. The aim of this project is to be able to get better and new techno██████████████████████████████████████████████████tasets in various applications. I want to do a research in order to determine if the use of the comp████████████████████████████████████████████████████████████████████████████████, people uploading datasets for various uses are made to purchase more memory and install to their computers due to a large number of datasets.
I believe that there is still a better way to reduce the size of the dataset to a level that can be managed so that one can comfortably load██████████████████████████████████████████████████e electronic device comes with limited memory and with large sizes of data, then one might be forced to delete some useful data in order to create space for new one.
Some small data tools used in loading datasets are only capable for loading it from files and taking it to MySQL tables however, if you create ██████████████████████████████████████████████████l only work perfectly in small file sizes of about 50MB but in the case of large file sizes, the pro██████████████████████████████████████████████████he data. With such limitations, there is need to develop a large data tool which can be able to load larger datasets.
For example, when using R, the problem may happen when calling functions like read.csv ( ) read tables () on either large data files, this may m████████████████████████████████████████████████████████████████████████████████hardly open a 20GB file on your desktop having an 8GB RAM. All the dat██████████████████████████████████████████████████may end up not being able to open even those files which were smaller even for the files which were ██████████████████████████████████████████████████ning on the OS. Due to this challenges, I planned to do a research in order to find a better technology which would ease the loading of datasets.
GOAL OF THE PROJECT
The main objective of the project is to research for better techniques through which large size data can be loaded into a database even without the use of large memory. There is various techniques which can be used to achieve this depending on the type of data you want to load.
METHOD OF USE.
My main focus was on the compression structure of Power BI. The first thing that power BI do to data before loading it is to compress it and th████████████████████████████████████████████████████████████████████████████████y engine which Power Pivot, Power Bi, and SSAS Tabular are built on. T██████████████████████████████████████████████████o one is needed to configure or either allow power Bi to do anything. When you import any size of data into the power BI, it will be compressed into a manageable level.
This technology worked for me after I imported a CSV file with 12GB data size and after loading into power BI. I had only 120MB. I had a sm████████████████████████████████████████████████████████████████████████████████ssion. The compression rate, however, depends on the type of dataset, ████████████████████████████████████████████████████████████████████████████████f more than 10GB and you have a problem in loading into a database, then consider loading it into a Power BI to check how it would be in size.
LIVE CONNECTION WITH POWER BI
For some data sources, we can connect live with the help of power BI on-premises or in the cloud. If your data set is large, then the best met██████████████████████████████████████████████████cle SQL or whichever source of data you like any which any size of data that you prefer and then cre████████████████████████████████████████████████████████████████████████████████help to bring the data structure and the metadata into the model in the Power BI, however, live connection will not import the data into the model in Power BI.
SQL SERVER CONNECTION LIVE ON PREMISES.
This can be tested with a database, for example, a large table of content containing about 48million records of data. The table may be about 5██████████████████████████████████████████████████ethod will always be the same. The data that I have used might not be 10GB but the same method will apply even with 10 TB or even more data size.
The table above shows how the connection works for the data that I have provided when doing live connection.
When you analyse all the data and select this table or any other table in the given data s██████████████████████████████████████████████████tion to choose between Import and live connection
After creation of the successful live connection, you can then build visualization for that or even create a relationship in the model. This is shown in the figure below.
After creation of the successful live connection, you can then build visualization for that or even create a relationship in the model. This is shown in the figure below.
After creation of the successful live connection, you can then build visualization for that or even create a relationship in the model. This is shown in the figure above.
For live connection to the on-premises data source, you will need to first install a power BI or personal gateway and configure it. With that installed, you can be able to connect to a wide range of on-premises data sources like databases and also the SSAS.
LIVE CONNECTION TO AZURE
Azure has got very many data sources that you can use for live connection such as Azure SQL database which is alike to SQL server on-premises but████████████████████████████████████████████████████████████████████████████████e cloud structure of a database which supports both unstructured and structured data, it has the capacity to expand the database compute engine irrespective the storage engine of the database.
OPTIMIZATION AT DATA SOURCE LEVEL
When you live to connect to the data source, the report will have to send a query to the data source. The data source will be different depen██████████████████████████████████████████████████ SSAS, you might get a faster response than in SQL server database. You need to consider all perform████████████████████████████████████████████████████████████████████████████████rvers you will need to consider column stores, indexing and many more optimization and also tuning tips. When I do a performance turning with a table of a normal index with 48 million records.
Regular index
A regularly selected sum from a table with 48 million records will only take four minutes and four seconds to run. The same query will respo██████████████████████████████████████████████████ave been clustered and will even be faster when I have clustered column store index in the same table with also the same amount of rows with the given data.
IMPLEMENTATION.
For datasets to be loaded on a database, the system must use a compressed structure of Power BI. Practically the power BI compresses large fi██████████████████████████████████████████████████ets which can be loaded into the database at a manageable level. It is possible also to connect live with some data sources when using Power BI. For example, a department can even connect live with some data sources on-premises or even in the cloud.
The essence to reduce the size of the dataset is achieved by correct use of this technology and if it can be utilized in most of the departme████████████████████████████████████████████████████████████████████████████████chieved. The technology is good to invest in, however, it has some small limitations which make it not to be 100 percent efficient.
LIMITATIONS OF LIVE CONNECTION.
The live connection has a limitation of not having any data tab in the power BI which would create calculated measures, tables or even columns. This m██████████████████████████████████████████████████olumns or measures in SQL servers source tables whenever you are using SQL server as the source. You████████████████████████████████████████████████████████████████████████████████ can only be able to set up a relationship and that the only thing that you can do in modeling concerning the Power BI.
NO DATA TAB, NO DAX in Live connection from Power BI.
Through the modeling tab of power BI, formatting is not provided so it will be difficult to set decimal points or even set a data type of column will not be possible with the power BI. In order to handle this, you will need to do so with the help of the power query or through the data source.
However full visualization part of the power BI is completely supported in live connection mode, it is because the visualization engine is a s████████████████████████████████████████████████████████████████████████████████ the separate component will work better than a product built with no component underlying it.
Power BI has an advantage over live connection since you can combine data sets from multiple databases or even files can take place in the building model. When working with a live connection, you cannot be able to use or even access data from more than one data source.
The feature of Q&A is not available at the dashboard in live connection whenever you have Power Q&A question box on top.
CHALLENGES
During the execution of the project, however, I achieved the goals of the project, there were challenges when finding the needed datasets wh████████████████████████████████████████████████████████████████████████████████test the ease through which various data files would be loaded was not readily available for use.
The project was so time-consuming and most of the research we did on early, however, it did not interfere with the project outcome. Most of the people who I considered as experts were not willing to assist me to get some practical experience in the area of interest.
Most of the people that I had approached so that they could assist me in doing the project were not willing to participate and dedicate their██████████████████████████████████████████████████ed to do most of the work single handedly due to lack of cooperation from the people who I had approached for assistant.
Although I had set enough budget for my project, there were other expenses that I incurred during my project. I personally did a lot of traveling in order to find enough data management experts who assisted me in gathering the needed information.
There were a number of people who had negative views concerning my project and also the type of approach I had took to execute it. There are a number of people who think that big problems and challenges can be solved by other people but will do no effort to try and solve by themselves.
Before the start of my project, I was not sure with the best technique that I could investigate. I had several options that I had to weigh in o██████████████████████████████████████████████████the process of loading large data sets in computers. To weigh this options was a challenge since I was not sure of the success of the project.
I encountered a lot of failures when trying to do live connecting in power BI the task bar could not allow for any formatting of the data hen████████████████████████████████████████████████████████████████████████████████ring data entry. This was time consuming and also tiresome to do. I had to repeat most of the procedures in loading the datasets in order to achieve the exact outcome that I was expecting.
I was not able to access different number of computers to with different memory capacities. I wanted to know if the rate at which the data w████████████████████████████████████████████████████████████████████████████████ computers that I had access to had the same RAM capacity and the memo████████████████████████████████████████████████████████████████████████████████ere I could draw my conclusions based on the results that I got. With ████████████████████████████████████████████████████████████████████████████████The speed at which the dataset was being loaded into the database was also depending on what was running in the OS.
The number of time that I loaded the data sets was also limited by the available resources. I could not access all the types of data files in o██████████████████████████████████████████████████o database. Compressing of files speed was depending on the size of the file and also the type of the file. Some files have got higher speed than others depending on the details of the files and what the files consist.
The success of my project was determined by the quality and the amount of information that I was able to gather. The limitation of the data████████████████████████████████████████████████████████████████████████████████my project. The number of people who were willing to assist me with th████████████████████████████████████████████████████████████████████████████████ I executed my project were not willing to do so. Most of the time and resources I spent on doing the project alone and my research was also based on written works as opposed to opinions given by people.
There was also a large expectation from the experts that I had consulted as regarding my project. Most of them were expecting a very intere██████████████████████████████████████████████████essure due to their expectation. Almost all the work was done perfectly but I had also to struggle v██████████████████████████████████████████████████ation that I was supposed to deliver were made possible and that I had achieved all the goals set for the project.
Most of the observations that I did during my project were made based on the classwork as I had no much experience on the practical part of th████████████████████████████████████████████████████████████████████████████████my goals with ease, I spent more time in practical works than I had ea██████████████████████████████████████████████████ during my project mainly came from the continuous work that we had done in classroom and thus making ne rely on the theory part of project execution. I had to learn a number of skills at the field since I had not acquire most of it from classroom.
FUTURE WORK.
There is need to research and develop another faster and efficient data tool that would be able to load the database into computers without necessaril████████████████████████████████████████████████████████████████████████████████ load large data sets from multiple rational database warehouses or database table. Therefore there is need to develop a better data tool which can do faster loading of data sets.
Research should also be done to ensure that better technique which does not necessarily compress data sets before loading it are developed. The██████████████████████████████████████████████████ich can accommodate all the data and you may need it to be loaded without compressing. This would also reduce the time it takes to retrieve the data.
Improvement to the system should be done to ensure that large data files are loaded faster irrespective of the type of data files. The system shou██████████████████████████████████████████████████h that it could allow formatting of data in the data tab of power BI during live connection. Also, it should be improved so that it can support multiple data source.
There are also challenges in retrieving lost data from databases, in future this systems should be upgraded in such a way that they can be abl██████████████████████████████████████████████████. Security should also be guaranteed in the storage area. The data should be kept secure from cyber threats or any other type of intruders who may unlawfully seek to retrieve the data.
There is need for additional research to be made so that the limitations that the power BI tool for data management is facing may be addre██████████████████████████████████████████████████logies for data loading should also be developed to ensure that the speed required at most of the or██████████████████████████████████████████████████stems are achieved. The speed at which the data is loaded into the database for various organization████████████████████████████████████████████████████████████████████████████████ in. Better technology will ensure that proper and faster data management is achieved in various organizations.
In future, more people should invest in developing better and powerful tools which can be compatible with all the types of files and data tha██████████████████████████████████████████████████ are being used are only limited to some dataset files while some files which are containing tables ██████████████████████████████████████████████████opriately into the database due to lack of better tools which can perform the work. Most people are forced to compress some files so that the files can be loaded with ease.
In future, more emphasis should be done on data tool management method and techniques that will be able to deal with the present problems that██████████████████████████████████████████████████e puts in schools to ensure that more students undertake research activities to ensure that they acq██████████████████████████████████████████████████n solving future data management challenges. Students should be encouraged to take more time in rese██████████████████████████████████████████████████ost people are aware of the challenges that are facing data management and also enable them acquire more knowledge on developing better data management tools
CONCLUSION
In conclusion, compressed structure power BI is a better tool for loading data sets into computers when it comes to large datasets. It has fa██████████████████████████████████████████████████ng used today. The technique can appropriately be used in large organizations and even institutions where large datasets are required to be fed into their database.
This is an appropriate and efficient technique for loading data as it does not interfere with the structure of the data being loaded. This technique will help solve the many challenges that organizations are facing when loading their data into their database.
The project was worth doing since by using power BI large datasets can be loaded easily within a data base. With this technology high quality ████████████████████████████████████████████████████████████████████████████████ations will have confidence with what they do and also retrieving of their data will be simple. The technology will also limit the time that office operations were carried out regarding the exercises of data loading.
Based on my observation during the execution of the project, there is need to invest a lot of money in database management and system manageme██████████████████████████████████████████████████d as a tool to data management are outdated and cannot cope with large amount of data that most orga██████████████████████████████████████████████████s need to develop new techniques that will help speed up the rate at which data is being loaded into system by various organization and also to ensure that their data is kept safe and readily available when there is need to retrieve it.
My project can be used to solve the challenges that we are experiencing now and even the future challenges involving the loading of data. More ████████████████████████████████████████████████████████████████████████████████ can be overcome. More improvements should also be done to the systems██████████████████████████████████████████████████ may be expected to occur regarding the use of the database management systems and immediate measures should be taken into place to ensure that no future problem is encountered.
Reference cited.
Rad. R. (2016). Steps Beyond the 10GB Limitation of Power BI. Retrieved from:
Landau. P. (2017) 5 Tips For Creating Better Project Proposal. Retrieved from: