How can we load only duplicate data to the target table

DWH-M · Post by **DWH-M** » Tue Aug 26, 2008 7:30 am

How can we load only duplicate data using data stage parallel job,

how can we identify by seeing the table, it is a slowly changing table

chulett · Post by **chulett** » Tue Aug 26, 2008 8:00 am

Define 'only duplicate data'.

bcarlson · Post by **bcarlson** » Tue Aug 26, 2008 4:18 pm

One option is to use the Create Key Change Column option in the Sort stage. Per the Help:

"This column is set to 1 for the first row in each group where the value of the sort key changes. Subsequent records in the group have the column set to 0."

Now, this gives you a way to identify the KEYS that are duplicated, but not all instances of the duplicates. We have a job that sorts the data with this option in place followd by a filter that splits the out based on the change field. That ends up in 2 datasets. Then we inner join the data back together again to capture all instances of the duplicated data. A left outer join will result in a list of non-duplicated data.

Brad.

Raamc · Post by **Raamc** » Tue Aug 26, 2008 5:20 pm

Refer Also

viewtopic.php?t=121387&highlight=

Raamc · Post by **Raamc** » Tue Aug 26, 2008 5:21 pm

Refer Also

viewtopic.php?t=121387&highlight=