Test Data Manager

View Only

Back to discussions

Expand all | Collapse all

Data Generator and duplicate data

Jump to Best Answer

1. Data Generator and duplicate data

0 Recommend
JeanFrancois Berube
Posted Apr 24, 2020 02:45 PM

Reply Reply Privately
Hi,

I'm using the Generator in the TDM portal version 4.9. I want to randomly generates data without any duplicates. The options "On Generated Duplicate" is set to remove and does the job. My problem is that I'm asking to generate 20K rows but after removing the duplicates it ends up with a number lower than 20K. To be clear, let's say I asked for 20K, TDM remove 5K duplicates so, at the end, I have 15K generated values instead of my target of 20K.

Is there a way to get 20K rows without duplicates?

Thanks for your help
2. RE: Data Generator and duplicate data

0 Recommend
Broadcom Employee

Billy Keefer
Posted Apr 25, 2020 03:30 PM

Reply Reply Privately
What is your primary key and how are you generating data for this key?

Original Message
3. RE: Data Generator and duplicate data

0 Recommend
JeanFrancois Berube
Posted Apr 27, 2020 01:35 PM

Reply Reply Privately
Billy,

Thanks for your help.

Here's the Oracle table definition:

CREATE TABLE SCRAMBLE.TEST_DUPLICATE_REPORT(COL1 VARCHAR2(1000 CHAR));
CREATE UNIQUE INDEX SCRAMBLE.TEST_DUPLICATE_REPORT_IDX ON SCRAMBLE.TEST_DUPLICATE_REPORT(COL1);
ALTER TABLE SCRAMBLE.TEST_DUPLICATE_REPORT ADD (
CONSTRAINT TEST_DUPLICATE_REPORT_IDX
PRIMARY KEY (COL1)
USING INDEX SCRAMBLE.TEST_DUPLICATE_REPORT_IDX
ENABLE VALIDATE);

I generate 1 column with the following (quite easy) function:
@randlov(0,@list(Company A,Company B,Company C,Company D,Company E)@)@

Here's the PUBLISH screen snapshot. Hope it is readeable, If not I can repost it.

At the end, I have a csv file with less that 5 rows.

Let me know if you need more detailed informations.

Original Message
4. RE: Data Generator and duplicate data

0 Recommend
Broadcom Employee

Billy Keefer
Posted May 01, 2020 04:47 AM

Reply Reply Privately
Random doesnt guarantee uniqueness. A lot will depend on the number of objects that are in the list that you are randomly trying to extract. 5 is a really small number. I randomly selected from a list of 300 and they were unique.

Maybe we should go back to what you are trying to accomplish. From above you want to create 20,000 rows. Lets concentrate on the value (column) that you are getting the duplicates on. When you are generating these 20000 rows, where is the data for this particular column coming from?

Are you using a sqllist from data painter and extracting the data from another database table? If yes, how many unique items are in this table?

Are you pulling data from a seedlist?

regards

Original Message
5. RE: Data Generator and duplicate data

0 Recommend
JeanFrancois Berube
Posted May 01, 2020 08:32 AM

Reply Reply Privately
I understand that random doesnt guarentee uniqueness. And that's the reason why I use the remove duplicates option.
Of course, the 5 is only a test table to demonstrate what's happenned when I use teh remove duplicates option.

For bigger number, I have two cases:

1) I have an Oracle table with 49K unique company names. I ask to generate a text report with 37K companies. I was using a randlov-SQLlist fonction. This produce some duplicates. So, I activated the remove duplicates option and things goes well (meaning no more duplicates) but the number generated was lower than 37K. So, the remove duplicates removes the duplicates as expected. But I cannot achieve my target of 37K easily.

2) Right now, I have a request to generate a list of firstnames abd lastnames. I will use 2 seedlists. The pair (Firstnames and lastname) has to be unique but the firstname may appear mutiple times (and lastname too). I'm working on it rigth now.

Again, like I said, the remove duplicates option works well. My concern that I cannot reach my target of generated lines.

So, my question was is there a way to remove teh duplicate AND to reach the number of lines I need to reach?
Maybe this is the way teh remove duplicates option is design or maybe I misunderstood something.

Original Message
6. RE: Data Generator and duplicate data

0 Recommend
Broadcom Employee

Billy Keefer
Posted May 01, 2020 09:01 AM

Reply Reply Privately
I now understand what you are asking....sorry about the misunderstanding
You raise a very good question and today I think that is how it works.

I will double check and come back to you.

Original Message
7. RE: Data Generator and duplicate data
Best Answer

0 Recommend
Broadcom Employee

Billy Keefer
Posted May 05, 2020 04:32 PM

Reply Reply Privately
Following up on my last update. This is how the publish works. So any duplicates are part of the overall count.
regards

Original Message
8. RE: Data Generator and duplicate data

0 Recommend
JeanFrancois Berube
Posted May 06, 2020 01:41 PM

Reply Reply Privately
Thank you Billy for your help.

I'll see with the team how we can work with that.

JF

Original Message
9. RE: Data Generator and duplicate data

0 Recommend
Rajkumar Mansuria
Posted Jun 25, 2020 07:51 AM

Reply Reply Privately
Hi Jean,

If you use RANDLOV and generating even 50% of records you will end up with duplicates, so use SEQLOV and then if you want random there is one parameter will jumble the records once before run so every time you run you get different set.

Now if you think your source has duplicates and also you get data for multiple columns from the same source then I suggest to write a distinct query on the key column use seqlov on those columns and the use this values as reference to generate data for other columns.

------------------------------
Thanks,
Rajkumar
------------------------------

Original Message
10. RE: Data Generator and duplicate data

0 Recommend
JeanFrancois Berube
Posted Jul 10, 2020 08:33 AM

Reply Reply Privately
Hi,

Rigth now I'm using the SEQLOV-SQLLIST with the shuffle option (of course the source table does not have duplicate). That way I can have a result similar the the randlov but when I want to avoid dulicates with multiple columns (lets say I want to have a unique combinaison of first_name and last_name) it is more complex to implement.

My life would be easier if tdm removed the duplicates before the quota calculation :)

BTW, Sorry for the delay... summer time :)

Original Message

Test Data Manager

Data Generator and duplicate data

JeanFrancois BerubeApr 24, 2020 02:45 PM

Billy KeeferApr 25, 2020 03:30 PM

JeanFrancois BerubeApr 27, 2020 01:35 PM

Billy KeeferMay 01, 2020 04:47 AM

JeanFrancois BerubeMay 01, 2020 08:32 AM

Billy KeeferMay 01, 2020 09:01 AM

Billy KeeferMay 05, 2020 04:32 PMBest Answer

JeanFrancois BerubeMay 06, 2020 01:41 PM

Rajkumar MansuriaJun 25, 2020 07:51 AM

JeanFrancois BerubeJul 10, 2020 08:33 AM

1. Data Generator and duplicate data

2. RE: Data Generator and duplicate data

3. RE: Data Generator and duplicate data

4. RE: Data Generator and duplicate data

5. RE: Data Generator and duplicate data

6. RE: Data Generator and duplicate data

7. RE: Data Generator and duplicate data Best Answer

8. RE: Data Generator and duplicate data

9. RE: Data Generator and duplicate data

10. RE: Data Generator and duplicate data

7. RE: Data Generator and duplicate data
Best Answer