Blog Viewer

Synthetic data – the new ‘gold’ standard

By Anon Anon posted 02-19-2016 05:11 AM


In QA and development, it has long been accepted that copies of production data are required as production data is the ‘gold copy’. Yes, that leads to a need to mask personalised data but it’s the quickest way to source referentially intact data, right?


Well, wrong, and with stricter EU General Data Protection Regulation pending, now is the time to look at current practices and assumptions and introduce another approach that frees up time, resources and improves better project outcomes.


A ‘gold copy’ database should contain the data needed to satisfy every possible test but in reality the vast majority of production data is very similar, covering “business as usual” transactions when in fact it is ‘bad data’ – the unexpected results, outliers and boundary conditions – that normally cause a system to collapse.


Add to that the fact that modern organizations now handle data in petabytes, not terabytes. Running copies of these databases (and it’s not uncommon to find some organizations with as many as 20 copies of a single database [1]) is expensive and slow.


Unwieldy and lacking the rich data sets needed to fully test functionality, there is still the additional pain point for teams as they create and recreate test cases manually.


And because such data is made with specific requirements or test cases in mind, it will almost immediately become outdated.


These issues alone are driving more organizations to introduce synthetic test data. The advantages it offers are clear. With the right tools, it is possible to build an accurate picture of what data exists, identify what additional data is needed for test environments and generate any missing data automatically.


Testers and developers can request the data they need for the task at hand, removing the need to manually search or create data by hand. We have seen that this reduces the time taken to fulfil data requests by up to 50%.


Synthetic test data is also fully secure and it should be thought of as the new gold standard. With the ability to extract, copy, and provide only the data needed it means that an organization no longer needs to maintain numerous full-sized copies of production databases. It can therefore meet any business or regulatory challenges ahead.


[1] Bloor Test Data Management report 2011

1 view