Big Data Security & Privacy

Challenges for Big Data Security & Privacy

Introduction

As big data expands the sources of data it can use, the trust worthiness of each data source needs to be verified and techniques should be explored in order to identify maliciously inserted data.

Information security is becoming a big data analytics problem where massive amount of data will be correlated, analyzed and mined for meaningful patterns.

Security of big data can be enhanced by using the techniques of authentication, authorization, encryption and audit trails. There is always a possibility of occurrence of security violations by unintended, unauthorized access or inappropriate access by privileged users.

To protect privacy, two commonly used approaches include the following.

One is to restrict access to the data by adding certification or access control to the data entries so sensitive information is accessible to a limited group of users only.
The other approach is to anonymize data fields such that sensitive information cannot be pinpointed to an individual record.

For the first approach, common challenges are to design secured certification or access control mechanisms, such that no sensitive information can be misconduct by unauthorized individuals. For data anonymization, the main objective is to inject randomness into the data to ensure a number of privacy goals (Xindong Wu et al. 2014).

Background

Today we are living in an era of digital world. With the rapid increase in digitization the amount of structured, semi structured and unstructured data being generated and stored is exploding.

Usama Fayyad (2012) has presented amazing data numbers about internet usage like “every day 1 billion queries are there in Google, more than 250 million tweets are there in Twitter, more than 800 million updates are there in Facebook, and more than 4 billion views are there in YouTube”. Each day, 2.5 quintillion bytes of data are generated and 90 percent of the data in the world today were created within the past two years.

The data produced nowadays is estimated in the order of zeta bytes, and it is growing around 40% every year. International Data Corporation (IDC)Opens in new window terms this as the “Digital Universe” and predicts that this digital universe is set to explode to an unimaginable 8 Zetabytes by the year 2015.

The above examples demonstrate the rise of big data applications where data collection has grown tremendously and is beyond the ability of commonly used software tools to manage, capture, and process.

From a privacy and security perspective, the challenge is to ensure that data subjects (i.e., individuals) have sustainable control over their data, to prevent misuse and abuse by data controllers (i.e., big data holders and other third parties), while preserving data utility, i.e., the value of big data for knowledge/patterns discovery, innovation and economic growth.

Cloud security allianceOpens in new window, big data working group, identify top protection and seclusion problems that need to confine for making the big data computing and infrastructure more secure. Most of these issues are linked to the big data storage and computation. There having some challenges which are related to secure data storage (Cloud Security Alliance White paper, 2012).

Main Focus

With the proliferation of devices connected to the Internet and connected to each other, the volume of data collected, stored, and processed is increasing everyday, which also brings new challenges in terms of the information security.

In fact, the currently used security mechanisms such as firewallsOpens in new window and DMZsOpens in new window cannot be used in the Big Data infrastructure because the security mechanisms should be stretched out of the perimeter of the organization’s network to fulfill the user/data mobility requirements and the policies of BYOD (Bring YourOwn Device)Opens in new window.

Considering these new scenarios, the pertinent question is what security and privacy policies and technologies are more adequate to fulfill the current top Big Data privacy and security demands (Cloud Security Alliance, 2013).

These challenges may be organized into four Big Data aspects such as:

Infrastructure security (e.g. security distributed computations using MapReduce),
Data privacy (e.g. data mining that preserves privacy/granular access),
Data management (e.g. secure data provenance and storage) and,
Integrity and reactive security (e.g. real time monitoring of anomalies and attacks).

Considering Big Data there is a set of risk areas that need to be considered. These include the information lifecycle (provenance, ownership and classification of data), the data creation and collection process, and the lack of security procedures. Ultimately, the Big Data security objectives are no different from any other data types – to preserve its confidentiality, integrity and availability.

Big Data being such an important and complex topic, it is almost natural that immense security and privacy challenges will arise (Michael & Miller, 2013; Tankard, 2012).

Big Data has specific characteristics that affect information security: variety, volume, velocity, value, variability, and veracity. These challenges have a direct impact on the design of security solutions that are required to tackle all these characteristics and requirements (Demchenko, Ngo, Laat, Membrey, & Gordijenko, 2014).

Currently, such out of the box security solution does not exist.

Also in this series include:

Citation