Test Data Management

These considerations should be made when it comes to test data needed during the QA cycle.

Analysis of data

  • A thorough analysis of all the different kinds of data that may be required has to be made to ensure effective management.

Data setup to mirror test and production environments

  • Ensure test data is mimicked across all servers to cover any test scenario gaps and alleviate any roadblocks when testing multiple environments

Determination of the test data clean-up

  • A clear process of deeming when test data needs to be created, altered, or cleaned up should be formulated on the needs of specific scrum activities and test scenarios.

Identify sensitive data and protect it

  • The mechanism to shield sensitive data must be identified.


  • Compare results that are produced by data sets from repeated test runs, and then automate this process of comparing to expose any test data errors that may occur during consecutive test runs.

Effective data refresh using a central repository

  • A lot of effort in creating test data can be saved by maintaining a central repository which contains all types of data that may be required for various kinds of testing. In consecutive test cycles, for either a new test case or modified test case, check if the data exists in the repository. If not existing, feed that data in the test environment first. Next, this can be directed to this repository for future reference. Now for consecutive release cycles, the test team can use all or a subset of this data. Depending on the sets of data that are frequently used, obsolete data can be easily eliminated to ensure that correct data is always present. Multiple versions of this repository can help greatly in regression testing to identify what change in data can cause code to break.

Good product = good testing | good testing = good test data | therefore, good product = good test data

  • Learn before executing. Information is data with meaning attached. If testers execute tests using data without understanding the reasoning behind it, the data will most likely be ineffective.