Data obfuscation is a term that every developer should comprehend and implement into every project. Obfuscation refers to the act of making something appear different from its actual form. To a security-aware developer, the term refers to any method used when hiding the actual value of a data object.
This post will take you through common obfuscation concepts, reasons, and tools. We’ll take a look at the term in a way that leaves you not only refreshingly informed about data obfuscation but capable of carrying it out.
To achieve this, let’s first consider why you need data obfuscation. While we alluded to hiding true values when we defined obfuscation, take the short detour below to understand why you need to obfuscate data.
Data Obfuscation: What’s the Point?
The internet, a place where your personal information (profile) equates to your presence in real life, is full of interesting resources. Sadly, just as a rose carries thorns, that profile is always at risk of being stolen and used to perpetrate crimes.
Image: Coded numbers and letters hiding some true meaning – Source: Giphy
If only there were a way of making that online profile less like your real-world presence. That way, even as you surf the internet, your profile couldn’t fall into the wrong hands. Even though a hacker can still read something from your digital footprint, it’s nothing that could lead back to your actual profile.
Achieving data obfuscation involves acknowledging that a piece of information is sensitive. These sensitive elements could be passwords, contact details, and full names provided in a test database. In this instance, you might need to maintain the data format while removing any connection to real user profiles. For instance, let’s assume you take a screenshot of your database before testing.
If you have the following row in a database,
Name: David Alex Age: 32 Cell: 555 444 3210 Email: firstname.lastname@example.org Loc: Atlanta
applying data obfuscation turns it into this:
Name: John Doe Age: 23 Cell: 333 666 1234. Email: email@example.com Loc: Vegas
When used in a test environment, the two lines of data can be validated with the same test results. The difference between changing David’s data and creating an entire database altogether is maintaining the schema and any anomalies in the data. This way, we can see how the app handles those anomalies without exposing the real data. Otherwise, we may as well be using a database detached from the application in question. Obfuscation ensures that the data will not expose David’s information (profile) to third parties.
Dev Tip: Using data obfuscation makes it such that the subject won’t get notifications whenever you’re running tests because you’re not using their real contact details. What’s important is that we’re not sharing private information. All the while, we’re maintaining the form of the data on which we need to run tests.
Data Obfuscation Methods
By now, you should have a firm understanding of why we’d go out of our way to hide sensitive data. Let’s now turn our attention to the various methods you can use to obfuscate sensitive data. Try mapping each of the methods that follow to some application areas as you read.
This is a common data protection method in which we disfigure the data entirely. You may have noticed that databases save passwords as long blocks of characters. The longer string is a result of salting. This effectively makes it harder to imagine or guess the original value. Unless an encryption key is known, reading the obfuscated block back to the original value would be impossible.
Masking is the method of data obfuscation we demonstrated above with Dave’s profile information. That kind of manipulation is specifically known as masking out data. It’s a static method, meaning that two copies result from the process. However, the latest test environment management tools now utilize dynamic data masking to maintain a single version of a database, only masking sensitive data when test tools require access to the database.
This method throws some misleading values into the original data. To do this, a tokenizing algorithm can modify the original data by adding or subtracting random characters or numbers to take the entire database out of scope. A simple example would have “David” processed to read as “Gravid.” This way, the resulting data is meaningless unless the reader is authorized to view original values. Hash functions work this way.
With randomization, you move the characters and numbers in our example database row (Dave’s data). The result doesn’t have any meaning, all the while maintaining length and validity constraints.
The name could end up as:
This technique offsets original values by a known degree in an attempt to anonymize them. For example, the age in all profiles could be moved up by 10 units. It would be hard to match the blurred profile to a real person because the database now says they’re ten years older than they actually are. This obfuscation method applies to number value types only. An example would be a cash records database.
Sometimes all it takes to add a layer of obfuscation replaces parts of the data with otherwise null-valued variables. Think of how your credit card number is sent to vendors, with the first section looking like a string of hash characters: ####-####-####-0000. Confusing, right.? Even if other cards are ending with 0000, the first sets of numbers will throw attempts at matching them to a specific credit card out the window. Good luck matching those last four digits to the right card name, expiration date, and CVV!
Choosing a data obfuscation method from all of the options depends on many factors. This is precisely the reason why there are more ways and algorithms for data obfuscating than just the six we’ve discussed. For instance, if you’re testing your application for verification and validation, it would make sense to maintain the data’s format after obfuscating it.
Putting Data Obfuscation to Use
After this crash course in data security, it only makes sense to bring everything into perspective. As a web developer, testing is a critical process for polishing applications. When you run datacentric tests, masking out values makes perfect sense as an obfuscation strategy. This way, data passing through team members’ hands doesn’t expose any actual profiles to malicious intent.
Awareness of how data obfuscation works can benefit when testing a simple module like a login form. For example, here’s how your testing process typically flows (with manual testing):
- Establish and schedule a test case. This instance will tag the login form as the test subject.
- The test engineer (or you could have put a different hat on) creates a scope for the test process.
- Determine a range of inputs, along with outcome expectations.
- Set a test environment using the same parameters as the production environment.
- Take screenshots of the database to test.
- The test, analysis, test iteration goes into full swing.
Or you could have a better pipeline. I sure hope so! Maybe even infuse some automated testing while you’re at. However, it’s clear in the example workflow that it maintained all values when you made a screenshot of the data. With that, you will have started a risk exposure process that proliferates as long as the copy of the data exists.
Introducing Testim for Data Obfuscation
Reading this far means you want to take your web applications testing workflow to the next level. The various methods of data obfuscation we discussed, from encryption to nulling, add a security layer to your testing phase. An easy way to implement these methods would be to explore the full features available in Testim.