Big Data Privacy
What is big data all about?
Big data is the process of storing vast data about a certain resource (it could be people, animals, businesses), and attempting to extract complex patterns out of that data. Businesses use this data to research and survey their target consumers to collect data. This could lead to specific marketing techniques to promote a product or a creation of a product if consumers are researching or enquiring about a certain topic or issue. The privacy and security concerns behind big data, are complex as most consumers don't know where their personal information is being sent to online, and how their information is being kept anonymous. This literature review will provide a look at multiple author's perspectives to big data, and the security and privacy issues and concerns that come along with it, the difference between privacy and security regarding big data and as well as recent privacy preservation techniques that are currently used in big data.
How do they collect it?
The collection amount and methods of big data has recently increased drastically due to the large boom of the internet and how much information we now store on companies websites, including social networking sites, healthcare and government applications, email, and many other companies. The data within big data is commonly diverse, meaning that it could be from a number of sources, they may contain audio, text, image or video. So when joining up or subscribing to a certain company or even email, you can become a statistic within big data.
How secure is big data?
So to be ensure that the privacy and security of big data is maintained on all platforms throughout the life cycle of big data, we can split up big data into 3 sections, data generation, data storage and data processing. In the data generation phase, the techniques used are access restriction and also falsifying data is also incorporated. Within the big data storage, encryption techniques such as Identity Based Encryption (IBE) and Attributed Based Encryption (ABE), and private clouds, where sensitive information within big data is stored separately. In the data processing stage, anonymization techniques such as generalization and suppression are used to protect the privacy of the data. (Priyank Jain. 2016)
Big data needs to be secure throughout its entire life cycle, through all stages, from both a security and privacy standpoint. If not then it could cause massive ramifications, where thousands if not millions of consumers private information could be leaked, which is one of the many major concerns consumers have with big data.
Potential Problems with big data
If the security and privacy precautions with big data are not correctly followed put into place, then hackers could easily leak private information into the public for all to see, which has happened in the past, even with bigger companies like eBay.
“Online commerce giant eBay asked users to change their passwords Wednesday after hackers stole encrypted passwords and other personal information, including names, e-mail addresses, physical addresses, phone numbers and dates of birth.”(Andrea Peterson, 2014). If the data is compromised, and falls into the wrong hands, it would become detrimental, requiring users to update and change their login details for that companies' website, and even worse in a scenario where patient's medical records have been stolen and hacked into.
To be able to protect big data from both physical and cyberattacks at the gigantic sort of size that big data requires is a massive amount of work, as well as a daunting task for any IT security professional, to be able to protect the data from attacks, companies need to enforce countermeasures, such as access control, backups, encryption, auditing, intrusion detection, penetration testing and corporate policies and procedures need to be set up to be able to prevent sensitive data falling into the wrong hands.
“At the same time, heightened security can also hurt your privacy: it can provide legitimate excuses to collect more private information such as employees' web surfing history on work computers.” (Jungwoo Ryoo, 2016). There is also other major concerns, regarding privacy, including companies pushing for targeted advertising, and wanting to track your every online move. This could be good and bad in different scenarios as shown below.
“Fraud detection is an arms race between good guys and bad guys. At the moment, the good guys seem to be gaining ground, with emerging innovations in IT technologies such as chip and pin technologies, combined with encryption capabilities, machine learning, big data and, of course, cloud computing.” (Jungwoo Ryoo, 2015) – Jungwoo describes a real-world scenario where fraudulent transactions are kept in check by banks, with the combination of big data, helping monitor which transactions are fraudulent on credit cards, and which are not.
“Insurers such as Progressive in the US and Prudential in the UK are now pushing for wider acceptance of telematics devices, which feedback real-time data on a driver's behaviour. So young drivers who can demonstrate that they keep within speed limits and do not brake suddenly too often, can pay less.”  (Bernard Marr, 2014) – On the other hand, Bernard describes a scenario where a motor insurance company is using big data to generalise consumers, tracking their brake amount on their cars.
What are they doing with my data?
One major concern people in the modern world have with big data, or any data that companies acquire from consumers, when they enter their sensitive, personal data onto a database, is what are companies doing with that data, are they selling it to other marketing companies?
“Since Google changed the way it tracks its users across the internet in June 2016, users' personally identifiable information from Gmail, YouTube and other accounts has been merged with their browsing records from across the web.” (Olivia Solon, 2016) – Olivia reports that google now is able to track users, and provide ad tracking, and personalised ads. This shows that Google is collecting your data, your browsing history, and creating a file on your behalf, so they can advertise products suited for you.
On the other hand, consumers should be aware of what they input into companies website's and what they search for, as it is most likely put into a database, and then put into a big data directory for marketing purposes. “Assuming it takes a minimum of two minutes to read the License Agreement (which itself is fast) we can be 95% confident no more than 8% of users read the License Agreement in full.”  (Jeff Sauro, 2011) – Jeff has stated that only 8% of users fully read any licence agreement online, including privacy and consent forms. So it would have to be up to the company to be transparent in simple terms with consumers about why and how their personal data is being used.
The different examples above show a differentiation between author's opinions about big data and privacy. But these scenarios lead to the same conclusion. Companies need to secure and protect the sensitive data within big data.
Each scenario and example above shows a different viewpoint, from each party within the big data life cycle, each contributor, and consumer within big data has a different opinion and view to big data and how it affects them.
From a business view, big data needs to be protected, as in the case where the sensitive data within the big data is compromised, it tarnishes the business's relationship with the public, as shown above where eBay required an official apology and then needed 145 million users to change their login details, costing the business money, and most likely a portion of their users.
From an IT security professional view, where they would be concerned about the data that they need to protect from outside sources, advising that it is a difficult task due to the fact that big data could come in many forms and can be stored in multiple locations, a major risk from both a cyberattack and a physical attack.
Big Data could also be helpful in automating credit card transactions that are fraudulent, as thousands, if not millions of transactions go through banks daily so it would be hard to manually keep track of every transaction, but big data and the incorporating of patterns and algorithms heighten and automate credit card security.
Big data could also be working against consumers, where companies which profit off of the amount of personal information they have on their clients, where they increase or decrease premiums depending on client's lifestyles and habits. This also include online searches and browsing where companies use ad targeting to advertise their product on your internet websites if you recently looked up that specific product.
In conclusion, companies need to ensure that if they have big data and use it, that they secure it, using correct methods, and that they also need to be transparent with their consumers about why and how they intend to use big data within their company.
...(download the rest of the essay above)