Db2 Tools

 View Only

Data Refresh / Data Cloning in a Db2 for z/OS environment

  • 1.  Data Refresh / Data Cloning in a Db2 for z/OS environment

    Broadcom Employee
    Posted Jul 01, 2021 03:27 AM
    Edited by Philippe Dubost Jul 01, 2021 09:07 AM

    This article will cover some typical scenarios for copying data from one set of tables to another schema/set of tables. This article will NOT cover the use of Db2 CLONE tables or data subsetting (where only a subset of the table data is copied).

    Most Db2 sites have the need to copy data – for example, copying production data to non-production environments for various purposes like quality testing, reporting that does not need "accurate data", query applications, etc.

    Db2 provides a "stand-alone" utility named DSN1COPY that is widely used despite limitations such as segmented tablespaces with more than one 2GB pageset, PBG tablespaces where source and target don't have the same number of partitions, etc. However, it's pretty fast – and the source can be a live Db2 pageset or an image copy. The downside is that the physical tablespace attributes as well as the table definitions have to be identical. Using DSN1COPY, you will also have to manually pair the internal identifiers (aka. OBIDXLAT).

    Another very common method is UNLOAD/LOAD of the data. This method is very flexible and can cope with the limitations mentioned for DSN1COPY. However, it is very resource intensive in terms of execution time, CPU time, and sort processing. The unload could be from a live tablespace or from image copies.

    A third method is to use a RECOVER tool that can read the source (production) image copy and restore the pages in the target tablespace, as well as the indexes (or potentially rebuild the indexes). This process could potentially include applying log-records in order to get consistency.

    One common challenge with all of these methods is data consistency. You will have to ask the consumers of the "cloned data" what the requirements are in order to make the best choice. Maybe the data doesn't have to be consistent across the tables, and maybe there is no RI (Referential Integrity) – which can simplify these tasks. If image copies are used as source, these are probably created using SHRLEVEL CHANGE so the data is "fuzzy". This might not be an issue as long as the indexes are rebuilt. If RI exists between tables, you might end up with orphan rows. Of course nowadays, the various image copy suppliers have the option to create consistent copies, but this is not widely adopted. 

    There are solutions available assisting customers in optimizing these data refresh challenges and at the same time eliminating the disadvantages mentioned. The Broadcom Administration Suite for Db2 for z/OS delivers a solution named RC/Merger (an integral part of RC/Migrator component) so no additional license is required. This solution was initially developed when Db2 V4 became available – assisting customers consolidating stand-alone Db2 subsystems into data sharing groups – and has been further enhanced over the years.

     

    RC/Merger is basically a sophisticated DSN1COPY utility eliminating the challenges you are facing with DSN1COPY, like:

    • Automatic OBIDXLAT translations if needed
    • Automatically handle segmented tablespaces with many 2GB pagesets
    • Dynamically growing/shrinking PBG partitions when needed
    • Support for implicitly defined pagesets
    • REPAIR VERSION where needed
    • Copying STATS
    • Potentially "rebuild" the objects using the same internal identifiers (if possible) as where data is copied from eliminating the overhead of XLAT
    • Potentially use SNAPSHOTs to create consistent data eliminating the need to REBUILD indexes
    • Parallel execution shortening execution time
    • Operates on the page level rather than row level

     

    A recent case study performed by a customer who had been using the UNLOAD/LOAD method for years, illustrates the effectiveness of RC/Merger. The characteristics of one of their larger objects:

    • PBR (Partition By Range) with 100 partitions
    • 3 NPI's
    • ~4bn rows

    The traditional Unload/Load process executed in approximately 2½ hours. Using RC/Merger resulted in 97% cpu savings and reduced the execution time for the tablespace and NPI's to a few minutes.

    Another use case was for a segmented tablespace with 550M rows, 26 2GB pagesets, average row-length 98 bytes and the large index pieces – consuming a total CPU-time of only 67 seconds!

    Whether you have access to a solution like RC/Merger or a solution from another vendor, you have some choices and decisions to make in order to implement a data refresh process. You have to think about consistency requirements based on the consuming applications – if you don't need data to be copied from a common point of consistency that will save time and resources, and often "fuzzy data" is good enough. Another important aspect is the schema differences – in today's DevOps and agile environment where changes happen more frequently, the data refresh challenges will not decrease, so please make sure you have a plan in place to mitigate as many challenges as possible.  

     

    For more information, contact Db2z Experts at db2-experts@broadcom.com

     

                                                                                       Article originally written by Steen Rasmussen