Data Masking: What is it?
Data breaches worldwide expose millions of people’s sensitive data each year, causing many business organizations to lose millions. In fact, in 2021, the average cost of a data breach so far is $4.24 million. Personally Identifiable Information (PII) is the costliest data type among all the compromised data types. Consequently, data protection has become the top priority of many organizations. That’s why data masking has become an essential technique many businesses need to protect their sensitive data.
What is Data Masking?
Data masking is a data security technique that scrambles data to create an inauthentic copy for non-production purposes. It retains the characteristics and integrity of the original production data and helps organizations minimize data security issues while utilizing data in a non-production environment. This masked data can be used for analytics, training, or testing.
A simple example is hiding personally identifiable information. Assume that an organization has an employee table in its database. It has the employee ID and full name of each of its employees. Through data masking, the organization might create a replica of the original database using a common first and last name.
Why is Data Masking important?
As the amount of data we need increases, so does the risk of a data breach. It’s impossible to create foolproof protection for every copy of your data. This is especially true when not everybody with access has the technological literacy we’d hope. But, an enterprise can neutralize several factors that make data breaches so expensive. A good rule of thumb is that no active database should contain unmasked data.
The goal of masking is to protect the data from abuse while still providing developers with data for testing. This reduces the impact of data breaches and improves data security. Data masking achieves this because no actual sensitive customer information links to the values.
How does it work?
The data masking process is simple, yet, it has different techniques and types. In general, organizations start with identifying all sensitive data your enterprise holds. Then, they use algorithms to mask sensitive data and replace it with structurally identical but numerically different data. What do we mean by structurally identical? For instance, passport numbers are 9 digits in the US and individuals usually have to share their passport information with airline companies. When an airline company builds a model to analyze and test the business environment, they create a different 9-digit long passport ID or replace some digits with characters.
Types of Data Masking
- Static Data Masking (SDM): Data is first masked in the database and then copied to a test environment so organizations can move the test data into untrusted environments or third-party vendors.
- Dynamic Data Masking (DDM): In DDM, second data storage is not needed. Data remains unmasked in the database, and upon request, data is masked and sent over. Contents are shuffled in real-time on-demand to make the data masked. Unmasked data is never exposed to unauthorized users. A reverse proxy is needed to achieve DDM.
- On-the-fly Data Masking: Modifies sensitive information as it is transferred between environments, ensuring that sensitive information is masked before it reaches the target environment. This technique is ideal for organizations migrating data between systems, or maintaining continuous integration or synchronization of disparate data sets.
Data Masking Techniques
Encryption
When data is encrypted, it becomes useless unless the viewer has the decryption key. Essentially, data is masked by the encryption algorithm. This is the most secure form of data masking but is also complex to implement because it requires technology to perform ongoing data encryption and mechanisms to manage and share encryption keys.
Scrambling
Scrambling is a basic masking technique that jumbles the characters and numbers into a random order hiding the original content. Although this is a simple technique to implement, you can only apply it to certain types of data, and it does not make sensitive data as secure as you might expect.
Nulling Out
By assigning a Null Value to a data column. Nulling Out masks the data so that any unauthorized user cannot view the actual data in it. This is another simple strategy, although it has the following drawbacks:
- Data Integrity is compromised.
- It’s more difficult to test and develop with such data.
Substitution
Data values are substituted with fake, but realistic, alternative values. For example, real customer names are replaced by a random selection of names from a phonebook.
Shuffling
If you need to retain uniqueness when masking values, you can protect the data by scrambling it, so that the real values remain, but are assigned to different elements. Given the salary table example, the actual salaries will all be listed, but it won’t be revealed which salary belongs to each employee. This method is best suited to larger datasets.
Number & date variance
The number and data variance methods are applicable for masking important financial and transaction date information. For instance, masking the employee salaries column with the employee salary variance will show the salaries of the highest and lowest-paid employees. You can ensure the meaningfulness of the data set by applying the variance around +/- 10% to all salaries in the set.
Date Switching
If the data in question involves dates that you want to keep confidential, you can apply policies to each data field to obfuscate the real date. For example, you can set back the dates of all active contracts by 100 days. The drawback of this method is that because the same policy applies to all values in a field, the compromise of one value results in the compromise of all values.
Redaction
This type requires Changing all characters to be changed to the same character. Easy to do but data loses its business value.
Benefits
Data Masking is essential in many regulations and compliance, such as HIPPA, where Personally Identifiable Information (PII) data must be protected and never be exposed.
- Masked Data also retains integrity and structural format.
- Developers and testers can get access to the data without any data exposure.
- Decreases security risk while having data analytics and displaying results.
- The viable solution against threats like data breaches, data loss, account or service hijacking, insecure interfaces, and malicious use of data by insiders.
What are the challenges of data masking?
Data masking is difficult because the changed data must retain any characteristics of the original data that would require specific processing. Yet it must be sufficiently transformed so that no one viewing the replica would be able to reverse-engineer it. Commercial software solutions are available to automate masking and provide confidence in the obfuscation quality.