Page 1 of 1

How can we load only duplicate data to the target table

Posted: Tue Aug 26, 2008 7:30 am
by DWH-M
How can we load only duplicate data using data stage parallel job,

how can we identify by seeing the table, it is a slowly changing table

Posted: Tue Aug 26, 2008 8:00 am
by chulett
Define 'only duplicate data'.

Posted: Tue Aug 26, 2008 4:18 pm
by bcarlson
One option is to use the Create Key Change Column option in the Sort stage. Per the Help:

"This column is set to 1 for the first row in each group where the value of the sort key changes. Subsequent records in the group have the column set to 0."

Now, this gives you a way to identify the KEYS that are duplicated, but not all instances of the duplicates. We have a job that sorts the data with this option in place followd by a filter that splits the out based on the change field. That ends up in 2 datasets. Then we inner join the data back together again to capture all instances of the duplicated data. A left outer join will result in a list of non-duplicated data.

Brad.

Posted: Tue Aug 26, 2008 5:20 pm
by Raamc

Posted: Tue Aug 26, 2008 5:21 pm
by Raamc