Page 1 of 1
How can we load only duplicate data to the target table
Posted: Tue Aug 26, 2008 7:30 am
by DWH-M
How can we load only duplicate data using data stage parallel job,
how can we identify by seeing the table, it is a slowly changing table
Posted: Tue Aug 26, 2008 8:00 am
by chulett
Define 'only duplicate data'.
Posted: Tue Aug 26, 2008 4:18 pm
by bcarlson
One option is to use the Create Key Change Column option in the Sort stage. Per the Help:
"This column is set to 1 for the first row in each group where the value of the sort key changes. Subsequent records in the group have the column set to 0."
Now, this gives you a way to identify the KEYS that are duplicated, but not all instances of the duplicates. We have a job that sorts the data with this option in place followd by a filter that splits the out based on the change field. That ends up in 2 datasets. Then we inner join the data back together again to capture all instances of the duplicated data. A left outer join will result in a list of non-duplicated data.
Brad.
Posted: Tue Aug 26, 2008 5:20 pm
by Raamc
Posted: Tue Aug 26, 2008 5:21 pm
by Raamc