Test Data Manager

Tech Tip: Publishing Performance Example

  • 1.  Tech Tip: Publishing Performance Example

    Posted 12-04-2017 12:08 PM

    This will also be continually updated in our TDM 4.4+ documentation. 

     

    The development team performed testing to show performance benchmarks for publishing to CSV and XLSX files. You can use these benchmarks to help tune the performance of your system.

     

    • Environment Details
      • Machine specs
      • Table used for the performance publish
    • Test 1 - Hard-coded Data
      • Publish to CSV
      • Publish to XLSX
    • Test 2 - One Expression
      • Publish to CSV
      • Publish to XLSX
      • SQL Server Target publish (default config)
      • SQL Server Target publish (iterationsBeforeCommit=20000)
      • SQL Server Target publish (iterationsBeforeCommit=50000)
    • Test 3 - With Expressions
      • Table DDL Used
      • Expressions Used
      • Publish to CSV

     

    Environment Details

    The following environment was set up to gather performance data for publishing to CSV and XLSX files.

    Machine specs

    The machine has 16 GB physical memory, 4 vCPU, and runs Microsoft Windows Server 2012 R2 DataCenter. The repository was on a local SQL Server 2016.

    Table used for the performance publish

    CREATE TABLE [dbo].[CARD_ACCOUNT](
        [CARD_ID] [int] NOT NULL,
        [CARD_BA_ID] [int] NOT NULL,
        [CARD_RA_ID] [int] NOT NULL,
        [CARD_BCH_ID] [int] NULL,
        [CARD_ICH_ID] [int] NULL,
        [CARD_NO] [varchar](16) NOT NULL,
        [CARD_EXP_DATE] [varchar](4) NOT NULL,
        [CARD_VALID_DATE] [varchar](4) NOT NULL,
        [CARD_NAME] [varchar](30) NOT NULL,
        [CARD_CVV] [decimal](4, 0) NOT NULL,
        [CARD_PRI] [varchar](3) NULL,
        [CARD_SUP] [varchar](3) NULL,
        [CARD_ADD] [varchar](3) NULL
    )

    Test 1 - Hard-coded Data

    No expressions were used. All data were hard-coded in the generator.

    Publish to CSV

    repeater
    performance based on number of rows
    10,0000.5 seconds
    100,0001.2 seconds
    1,000,0007.6 seconds
    10,000,0001 min 08 seconds
    100,000,00011 min 48 seconds

    Publish to XLSX

    Publish to XLSX is memory intensive compared to publish to CSV. Using the default CA TDM Portal configuration, the publish hit a wall around 300,000 counts when the CPU usage went high and stayed high. In fact, performance started degrading around 260,000 counts.

    This behavior is caused by the GC (Garbage collector) going overdrive when trying to clean up some memory to make sure that the Portal application does not crash with an out-of-memory exception.

    The following graph outlines the impact of garbage collection on the Portal:

    Before starting a high-volume publish to XLSX, make sure you increase the memory that is allocated to the Portal.

    Edit the config file called wrapper.conf located under CA\CA Test Data Manager Portal\service\conf. You can set either maxmemory or maxmemory.percent. With maxmemory.percent, the maximum allocated memory is calculated from the number that was set, times the physical memory. For more information about what needs to be done to increase the memory size used by the java process, see https://wrapper.tanukisoftware.com/doc/english/prop-java-maxmemory.html.

    repeater
    performance based on number of rows
    100,00032 seconds
    200,00059 seconds
    300,0001 min 30 seconds
    400,0001 min 49 seconds
    500,0002 min 40 seconds
    600,0003 min 36 seconds

    Test 2 - One Expression

    We publish using one expression ~NEXT~ in the generator.

    Publish to CSV

    repeater
    performance based on number of rows
    1,000,00014 seconds
    10,000,0001 min 29 seconds
    100,000,00012 min 38 seconds

    Publish to XLSX

    repeater
    performance based on number of rows
    100,00024 seconds
    200,00046 seconds
    300,0001 min 18 seconds
    400,0001 min 47 seconds
    500,0002 min 17 seconds
    600,0002 min 41 seconds
    700,0003 min 16 seconds

    SQL Server Target publish (default config)

    repeater
    performance based on number of rows
    100,0004 min 18 seconds
    200,0008 min 16 seconds
    400,00017 min 36 seconds

    (portal restarted)

    800,000

    23 min 12 seconds

    SQL Server Target publish (iterationsBeforeCommit=20000)

    tdmweb.publish.batchCommit=true

    tdmweb.publish.iterationsBeforeCommit=20000

    repeater
    performance based on number of rows
    repeater
    performance based on number of rows
    800,00035 seconds
    10,000,0006 min 33 seconds
    100,000,0001 hour 6 min 46 seconds

    SQL Server Target publish (iterationsBeforeCommit=50000)

    tdmweb.publish.batchCommit=true

    tdmweb.publish.iterationsBeforeCommit=50000

    repeater
    performance based on number of rows
    1,000,00042 seconds
    10,000,0006 min 31 seconds

     

    Test 3 - With Expressions

    We publish using several expressions.

    Table DDL Used

    CREATE TABLE equifax_records (
    "update_period" numeric (38, 0) ,
    "peer" varchar (20) ,
    "state" varchar (20) ,
    "county" varchar (20) ,
    "product" varchar (20) ,
    "vintage" varchar (20) ,
    "originalrisk" varchar (20) ,
    "currentrisk" varchar (20) ,
    "term" varchar (20) ,
    "smallbusinessownerflag" varchar (20) ,
    "mortgageindicator" varchar (20) ,
    "consumer_age" varchar(20) ,
    "edti" varchar (20) ,
    "pim" varchar (20) ,
    "n_cur" numeric (38, 0) ,
    "n_030" numeric (38, 0) ,
    "n_060" numeric (38, 0) ,
    "n_090" numeric (38, 0) ,
    "n_120" numeric (38, 0) ,
    "n_svr" numeric (38, 0) ,
    "n_bkr" numeric (38, 0) ,
    "n_misc" numeric (38, 0) ,
    "n_closed_pos" numeric (38, 0) ,
    "bal_cur" numeric (38, 0) ,
    "bal_030" numeric (38, 0) ,
    "bal_060" numeric (38, 0) ,
    "bal_090" numeric (38, 0) ,
    "bal_120" numeric (38, 0) ,
    "bal_svr" numeric (38, 0) ,
    "bal_misc" numeric (38, 0) ,
    "bal_closed_pos" numeric (38, 0) ,
    "pmt" numeric (38, 0) ,
    "hc" numeric (38, 0) ,
    "bal_bk" numeric (38, 0) ,
    "n_fcs" numeric (38, 0) ,
    "bal_fcs" numeric (38, 0) ,
    "n_pos_bal" numeric (38, 0) );

    Expressions Used

    Table Name Column NameDefinitionData Type

    EQUIFAX_RECORDS

    n_pos_bal

    @randlov(0,@perclist(90%@randrange(1,2)@,5%0,5%@randrange(3,10)@)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    vintage

    Q@randlov(0,@list(1,2,3,4)@)@@string(@randdate(2005/01/01,~YEAR~/01/01)@,YYYY)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    peer

    @randlov(0,@list(GM,CM,CP,NC,OT)@)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    bal_closed_pos

    @if(^bal_cur^=0,@randrange(0,60000)@,0)@

    NUMBER(38)

    EQUIFAX_RECORDS

    n_bkr

    @randlov(0,@perclist(1%1,99%0)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    state

    @seqlov(0,@seedlist(State_County,S)@,2)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    bal_120

    @randlov(0,@perclist(99%0,1%@randrange(0,30000)@)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    n_120

    @randlov(0,@perclist(1%1,99%0)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    edti

    @randrange(0,9)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    n_svr

    @randlov(0,@perclist(1%1,99%0)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    bal_060

    @randlov(0,@perclist(99%0,1%@randrange(0,60000)@)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    hc

    @if(^bal_cur^=0,0,@addrand(^bal_cur^,0,20000)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    update_period

    ~YEAR~@randlov(0,@list(01,02,03,04,05,06,07,08,09,10,11,12)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    originalrisk

    @randrange(0,14)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    smallbusinessownerflag

    @randlov(0,@perclist(15%1,85%~EMPTY~)@)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    n_misc

    @randlov(0,@perclist(1%1,98%0,1%-1)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    bal_030

    @randlov(0,@perclist(97%0,3%@randrange(0,90000)@)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    bal_cur

    @randlov(0,@perclist(20%@randrange(100000,900000)@,40%@randrange(0,9000)@,35%@randrange(10000,90000)@,5%@randrange(1000000,2000000)@)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    bal_fcs

    0

    NUMBER(38)

    EQUIFAX_RECORDS

    bal_misc

    @randlov(0,@perclist(99%0,1%@randrange(0,20000)@)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    n_060

    @randlov(0,@perclist(25%1,70%0,5%2)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    n_090

    @randlov(0,@perclist(1%1,99%0)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    n_030

    @randlov(0,@perclist(45%1,35%0,15%2,1%3,1%4,1%5,1%10,1%25)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    bal_bk

    0

    NUMBER(38)

    EQUIFAX_RECORDS

    pmt

    @randlov(0,@perclist(80%@randrange(0,1000)@,20%@randrange(1000,10000)@)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    county

    @upper(@seqlov(0,@seedlist(State_County,S)@,5)@)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    pim

    @randrange(0,9)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    n_fcs

    0

    NUMBER(38)

    EQUIFAX_RECORDS

    mortgageindicator

    @randlov(0,@perclist(45%1,55%0)@)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    n_cur

    @randrange(1,6)@

    NUMBER(38)

    EQUIFAX_RECORDS

    term

    @randrange(0,11)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    n_closed_pos

    @randlov(0,@perclist(10%1,89%0,1%2)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    bal_090

    @randlov(0,@perclist(99%0,1%@randrange(0,60000)@)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    currentrisk

    @randrange(0,14)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    product

    @randlov(0,@list(AB1,AF2,AF1,AB2)@)@

    VARCHAR(20)

    EQUIFAX_RECORDS

    bal_svr

    @randlov(0,@perclist(99%0,1%@randrange(0,60000)@)@)@

    NUMBER(38)

    EQUIFAX_RECORDS

    consumer_age

    @randrange(0,7)@

    VARCHAR(20)

    Publish to CSV

    repeater
    performance based on number of rows
    100,00014 min 51 seconds