Understanding Data Obfuscation: What Developers Need to Know

Data obfuscation is a term that every developer should comprehend and implement into every project. Obfuscation refers to the act…

Testim
By Testim,

Data obfuscation is a term that every developer should comprehend and implement into every project. Obfuscation refers to the act of making something appear different from its actual form. To a security-aware developer, the term refers to any method used when hiding the actual value of a data object. In the realm of software testing, data obfuscation is of paramount importance. Testing is awesome and we love it, but it can lead to user data being compromised if your test data management strategy is reckless when it comes to data protection.

This post will take you through common obfuscation concepts, reasons, and tools. We’ll take a look at the term in a way that leaves you not only refreshingly informed about data obfuscation but capable of carrying it out.

This is a summary of what we’ll cover in the post:

  • Data obfuscation: what’s the point?
  • 8 Different methods for data obfuscation
  • Putting data obfuscation to use
  • Introducing Testim for data obfuscation

Let’s get to it.

Data Obfuscation: What’s the Point?

The internet, a place where your personal information (profile) equates to your presence in real life, is full of interesting resources. Sadly, just as a rose carries thorns, that profile is always at risk of being stolen and used to perpetrate crimes.

Coded numbers and letters hiding some true meaning

Image: Coded numbers and letters hiding some true meaning – Source: Giphy

If only there were a way of making that online profile less like your real-world presence. That way, even as you surf the internet, your profile couldn’t fall into the wrong hands. Even though a hacker can still read something from your digital footprint, it’s nothing that could lead back to your actual profile.

Achieving data obfuscation involves acknowledging that a piece of information is sensitive. These sensitive elements could be passwords, contact details, and full names provided in a test database. In this instance, you might need to maintain the data format while removing any connection to real user profiles. For instance, let’s assume you take a screenshot of your database before testing.

Expand Your Test Coverage

Fast and flexible authoring of AI-powered end-to-end tests — built for scale.
Start Testing Free

If you have the following row in a database,

Name: David Alex      Age: 32      Cell: 555 444 3210      Email: [email protected]     Loc: Atlanta

applying data obfuscation turns it into this:

Name: John Doe        Age: 23      Cell: 333 666 1234.     Email: [email protected]       Loc: Vegas

When used in a test environment, the two lines of data can be validated with the same test results. The difference between changing David’s data and creating an entire database altogether is maintaining the schema and any anomalies in the data. This way, we can see how the app handles those anomalies without exposing the real data. Otherwise, we may as well be using a database detached from the application in question. Obfuscation ensures that the data will not expose David’s information (profile) to third parties.

Another crucial word that should come to mind when talking about data obfuscation is compliance. This word by itself could mean a number of different things, but in the context of digital security, you’re most likely to encounter it meaning being compliant with laws and regulations that protect data from users. You’ve probably heard of GDPR—which stands for General Data Protection Regulation—which is a privacy regulation from the European Union. Other similar regulations exist around the world, such as Brazil’s LGPD and California’s CCPA, just to name a few.

Why should you care about these types of laws? Simple: failure to comply with them can result in dire legal and financial consequences to your organization, and that’s not to mention the stain on its reputation. Long story short: privacy laws and regulations are a big part of why it’s important to obfuscate users’ data, should it follow into the wrong hands.

Dev Tip: Using data obfuscation makes it such that the subject won’t get notifications whenever you’re running tests because you’re not using their real contact details. What’s important is that we’re not sharing private information. All the while, we’re maintaining the form of the data on which we need to run tests.

Data Obfuscation Methods

By now, you should have a firm understanding of why we’d go out of our way to hide sensitive data. Let’s now turn our attention to the various methods you can use to obfuscate sensitive data. Try mapping each of the methods that follow to some application areas as you read.

1. Encryption

This is a common data protection method in which we disfigure the data entirely. You may have noticed that databases save passwords as long blocks of characters. The longer string is a result of salting. This effectively makes it harder to imagine or guess the original value. Unless an encryption key is known, reading the obfuscated block back to the original value would be impossible.

2. Masking

Masking is the method of data obfuscation we demonstrated above with Dave’s profile information. That kind of manipulation is specifically known as masking out data. It’s a static method, meaning that two copies result from the process. However, the latest test environment management tools now utilize dynamic data masking to maintain a single version of a database, only masking sensitive data when test tools require access to the database.

3. Tokenizing

This method throws some misleading values into the original data. To do this, a tokenizing algorithm can modify the original data by adding or subtracting random characters or numbers to take the entire database out of scope. A simple example would have “David” processed to read as “Gravid.” This way, the resulting data is meaningless unless the reader is authorized to view original values. Hash functions work this way.

4. Randomization

With randomization, you move the characters and numbers in our example database row (Dave’s data). The result doesn’t have any meaning, all the while maintaining length and validity constraints.

The name could end up as:

Vidad Xela

5. Blurring

This technique offsets original values by a known degree in an attempt to anonymize them. For example, the age in all profiles could be moved up by 10 units. It would be hard to match the blurred profile to a real person because the database now says they’re ten years older than they actually are. This obfuscation method applies to number value types only. An example would be a cash records database.

6. Nulling

Sometimes all it takes to add a layer of obfuscation replaces parts of the data with otherwise null-valued variables. Think of how your credit card number is sent to vendors, with the first section looking like a string of hash characters: ####-####-####-0000. Confusing, right.? Even if other cards are ending with 0000, the first sets of numbers will throw attempts at matching them to a specific credit card out the window. Good luck matching those last four digits to the right card name, expiration date, and CVV!

Choosing a data obfuscation method from all of the options depends on many factors. This is precisely the reason why there are more ways and algorithms for data obfuscating than just the six we’ve discussed. For instance, if you’re testing your application for verification and validation, it would make sense to maintain the data’s format after obfuscating it.

 

7. Substitution

Substitution means exactly what it sounds like: substituting a value with another value in the same “category” but taken from a pre-defined set of possible values i.e. a dictionary.

For instance, let’s say your database contains a three-part name, such as Eric David Smith. You could then replace the first name with a random value from a dictionary, then do the same with the second name, and finally perform the same with the family name. By doing this, you would end up with a totally different name that, despite looking like a real name, couldn’t be traced back to the original user.

8. Shuffling

The substitution technique preserves the semantics or form of the data while completely changing its value. You end up with something that’s still obviously a name—or a phone number, or a ZIP code, etc—but doesn’t refer to real data. As such, the substation technique is perfect for scenarios that require that the obfuscated data still “works” in the expected ways as the original data.

However, you’ll often find yourself in situations where that doesn’t matter. For instance, you might need to take a screenshot from a database and obfuscate the data in order to protect users’ privacy. In that case, you might not need the obfuscated phone number to be a possibly valid phone number, for instance.

In such scenarios, a better technique for you might be shuffling. With shuffling, you change individual digits or characters to different positions. While the result might not be semantically valid, that’s completely fine if you don’t need it to be.

A potential downside of shuffling is that, if the algorithm used to shuffle the values is too predictable, it can be possible to reconstruct the original value from the obfuscated one, undermining the value of the process.

Putting Data Obfuscation to Use

After this crash course in data security, it only makes sense to bring everything into perspective. As a web developer, testing is a critical process for polishing applications. When you run datacentric tests, masking out values makes perfect sense as an obfuscation strategy. This way, data passing through team members’ hands doesn’t expose any actual profiles to malicious intent.

Awareness of how data obfuscation works can benefit when testing a simple module like a login form. For example, here’s how your testing process typically flows (with manual testing):

  1. Establish and schedule a test case. This instance will tag the login form as the test subject.
  2. The test engineer (or you could have put a different hat on) creates a scope for the test process.
  3. Determine a range of inputs, along with outcome expectations.
  4. Set a test environment using the same parameters as the production environment.
  5. Take screenshots of the database to test.
  6. The test, analysis, test iteration goes into full swing.

Or you could have a better pipeline. I sure hope so!  Maybe even infuse some automated testing while you’re at. However, it’s clear in the example workflow that it maintained all values when you made a screenshot of the data. With that, you will have started a risk exposure process that proliferates as long as the copy of the data exists.

Introducing Testim for Data Obfuscation

This is where test automation tools like Testim come in handy. With Testim, you can include a custom step that masks sensitive data (black it out) before taking a screenshot for testing.

Reading this far means you want to take your web applications testing workflow to the next level. The various data obfuscation methods we discussed, from encryption to nulling, add a security layer to your testing phase. An easy way to implement these methods would be to explore the full features available in Testim.

What to read next

A Leader’s Guide to Test Data Management (TDM)

Test Data Is Critical: How to Best Generate, Manage, and Use It