I understand that random doesnt guarentee uniqueness. And that's the reason why I use the remove duplicates option.
Of course, the 5 is only a test table to demonstrate what's happenned when I use teh remove duplicates option.
For bigger number, I have two cases:
1) I have an Oracle table with 49K unique company names. I ask to generate a text report with 37K companies. I was using a randlov-SQLlist fonction. This produce some duplicates. So, I activated the remove duplicates option and things goes well (meaning no more duplicates) but the number generated was lower than 37K. So, the remove duplicates removes the duplicates as expected. But I cannot achieve my target of 37K easily.
2) Right now, I have a request to generate a list of firstnames abd lastnames. I will use 2 seedlists. The pair (Firstnames and lastname) has to be unique but the firstname may appear mutiple times (and lastname too). I'm working on it rigth now.
Again, like I said, the remove duplicates option works well. My concern that I cannot reach my target of generated lines.
So, my question was is there a way to remove teh duplicate AND to reach the number of lines I need to reach?
Maybe this is the way teh remove duplicates option is design or maybe I misunderstood something.
Original Message:
Sent: 05-01-2020 04:47 AM
From: Billy Keefer
Subject: Data Generator and duplicate data
Random doesnt guarantee uniqueness. A lot will depend on the number of objects that are in the list that you are randomly trying to extract. 5 is a really small number. I randomly selected from a list of 300 and they were unique.
Maybe we should go back to what you are trying to accomplish. From above you want to create 20,000 rows. Lets concentrate on the value (column) that you are getting the duplicates on. When you are generating these 20000 rows, where is the data for this particular column coming from?
Are you using a sqllist from data painter and extracting the data from another database table? If yes, how many unique items are in this table?
Are you pulling data from a seedlist?
regards
Original Message:
Sent: 04-27-2020 01:35 PM
From: Jean-Francois Berube
Subject: Data Generator and duplicate data
Billy,
Thanks for your help.
Here's the Oracle table definition:
CREATE TABLE SCRAMBLE.TEST_DUPLICATE_REPORT(COL1 VARCHAR2(1000 CHAR));
CREATE UNIQUE INDEX SCRAMBLE.TEST_DUPLICATE_REPORT_IDX ON SCRAMBLE.TEST_DUPLICATE_REPORT(COL1);
ALTER TABLE SCRAMBLE.TEST_DUPLICATE_REPORT ADD (
CONSTRAINT TEST_DUPLICATE_REPORT_IDX
PRIMARY KEY (COL1)
USING INDEX SCRAMBLE.TEST_DUPLICATE_REPORT_IDX
ENABLE VALIDATE);
I generate 1 column with the following (quite easy) function:
@randlov(0,@list(Company A,Company B,Company C,Company D,Company E)@)@
Here's the PUBLISH screen snapshot. Hope it is readeable, If not I can repost it.
At the end, I have a csv file with less that 5 rows.
Let me know if you need more detailed informations.
Original Message:
Sent: 04-25-2020 03:29 PM
From: Billy Keefer
Subject: Data Generator and duplicate data
What is your primary key and how are you generating data for this key?
Original Message:
Sent: 04-24-2020 02:45 PM
From: Jean-Francois Berube
Subject: Data Generator and duplicate data
Hi,
I'm using the Generator in the TDM portal version 4.9. I want to randomly generates data without any duplicates. The options "On Generated Duplicate" is set to remove and does the job. My problem is that I'm asking to generate 20K rows but after removing the duplicates it ends up with a number lower than 20K. To be clear, let's say I asked for 20K, TDM remove 5K duplicates so, at the end, I have 15K generated values instead of my target of 20K.
Is there a way to get 20K rows without duplicates?
Thanks for your help