Testim Development Kit Open Beta Overview Register Today

Test Data Is Critical: How to Best Generate, Manage, and Use It

When it comes to testing, data is king! In an increasingly digitized world, data is becoming more important for many…

By Testim,

When it comes to testing, data is king! In an increasingly digitized world, data is becoming more important for many businesses. However, many people aren’t aware that data plays an important role in testing. Why? Because the best way to test your application is with real data.

Of course, you should avoid using production data directly. Instead, use test data. This term refers to the generation of data that comes as close as possible to your production data without revealing any sensitive information.

Why does the accuracy and structure of test data matter? Well, it doesn’t make sense to test your software with completely meaningless data. This might sound harsh, but it’s probably better not to perform testing at all with such data, as you might be testing nothing. Meaningless data doesn’t add any value to the quality of your application. Therefore, test data needs to be meaningful for you to perform worthwhile tests without revealing private information.

In addition, with the rise of automated testing, there’s little room for manual test data creation. Continuous testing has gained a lot of attention within the DevOps and software development communities. This part of the testing strategy involves generating test data on the fly while running test cases. Basically, your testing strategy relies on a script that can generate the ideal testing data for you and your projects.

This article explains how to safely generate test data, how to manage that test data, and last, how to use the test data.

Generating High-Quality Test Data

To perform quality testing, you’ll need data of high quality. The goal of test data generation is to generate meaningful, connected, and interrelated data.

According to the State of DevOps 2019 report by Redgate, 65% of companies copy their production data to be used in testing. It’s quite worrying that only 36% of these companies apply masking techniques to protect the information they’re using from hackers. However, basing your test data on your production data is actually not a bad approach after all.

Production data represents an accurate image of the data that works well with your application. This means that this data is well suited for running test cases. Of course, it’s important to mask or substitute any sensitive data to avoid disclosing any personally identifiable information.

Furthermore, other approaches to generate test data include:

  • Generate data directly to a database.
  • Prepare CSV or JSON files that contain data to be used by scripts or test cases.
  • Generate data by interacting with a front end. You can manually generate test data by interacting with the front end and exploring advanced paths.

The above strategies require more effort than continuous testing does, though. Ideally, you want test data generation to be part of your test automation strategy.

Next, let’s explore three common pitfalls you may encounter when generating test data that’s based on your production data.

(Want to learn more about test environments? Read Testim’s blog about managing your test environment.)

3 Common Pitfalls When Using Production Data as Test Data

Using your production data might be a smart approach for your organization to generate test data. However, many organizations forget about the limitations of this data. Here are three common pitfalls companies encounter when basing their test data on production data.

Pitfall 1: Missing Data

When the development team creates new functionality, this might introduce new data that’s being captured. This means that you have new tables in your database for which you don’t have any sample data. When you’re blindly copying production data to be used as test data, you might forget about these new data tables.

Therefore, analyze if any new data has been introduced that your testing engineers need to generate.

Pitfall 2: Production Data That Follows the Happy Path

For testing engineers, the “happy path” is a common term that refers to testing only the success scenarios. This happy path is also easy to find in production data, as every action that a user completes should be successful. Considering this, your production data might not be the ideal data set to use for testing. You’ll likely have to create data for negative scenarios as well, so you can test failures.

Pitfall 3: Testing Edge Cases

Your production data often doesn’t represent any edge cases. Because the production data represents the happy path, you won’t find many edge cases or advanced flows in your data. This might be an issue if you want to test all possible scenarios to reach 100% test coverage. Your production data might test only 70% to 80% of the scenarios.

In short, you won’t reach 100% test coverage solely with production data. Your application requires additional data to represent advanced flows. Furthermore, generating this data might require a more manual approach.

Test Data Management: How Can You Do It?

Test data management includes many aspects, such as removing personally identifiable information and performing data validity checks. Here are four approaches you should follow to manage your test data accordingly. Each approach is equally important when managing your test data.

Approach 1: Remove Any Personally Identifiable Information.

First, check if your data contains any personally identifiable information (PII). If so, apply data masking techniques such as substitution, shuffling, or blurring. These techniques help you to make data non-identifiable.

Next, check the validity of your test data regularly.

Approach 2: Perform a Data Validity Check.

As development moves forward and you or your team members add new features, your data should move forward too. Therefore, perform test data audits regularly to find outdated data. Furthermore, validate if any data is missing to support new functionality. As mentioned earlier in this article, you might end up introducing new features, meaning that you also need new data tables.

Approach 3: Refresh Your Test Data Regularly.

Besides checking the validity of your data, it’s important to regularly refresh your data. This process can be easily automated with scripts that help you generate new data. This buyer’s guide can help you evaluate automated testing solutions.

Refreshing your test data can improve the quality of your application. Different data might expose bugs that your team hasn’t discovered yet with previous test data. Therefore, it’s important to make the time to regularly update your test data.

Approach 4: Manage Data Access.

Last of all, manage data access. Your organization needs to know how to access all important data. In addition, to ensure smooth testing, make sure your testing engineers always have access to the required data. You don’t want to slow down a release because of data inaccessibility.

Tip: Consider creating a list of data sources you need for testing and where they are located. This helps the testing engineers to easily find test data.

How Can You Use Test Data?

First of all, always make a copy of your test data before using it. That way, if something goes wrong, you can still access the original test dataset.

Next, you can use test data in various ways. Scripts can convert test data into different formats or insert the data into a database. For example, you may want to directly inject the data into a test database in order to test whether the application runs correctly.

After you run the test cases, you can do several things with your test data:

  • Store the final state of your database as a reference.
  • Delete all test data to avoid confusion about which is the original test data file. Also, clean the imported or outputted test files in your application. It can be a tricky process to clean everything accordingly. For example, output files may be hidden in several places in your tests. It’s easy to miss a couple of files when cleaning them up.
  • Use the end state of your database as input for further testing.


As you can see, test data management includes many elements, from data generation to data management and usage. It can be challenging to get all data aspects right. The most important question to solve is whether you should use production data to generate test data. Using production data as test data saves your organization a lot of time. However, this approach does not offer full test coverage and only covers the happy path.

This post was written by Michiel Mulders. Michiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!

Testim's latest articles, right in your inbox.

From our latest feature releases, to the way it impacts the businesses of our clients, follow the evolution of our product

Blog Subscribe