DSXchange

Posted: **Tue Aug 26, 2008 7:30 am**

How can we load only duplicate data using data stage parallel job,

how can we identify by seeing the table, it is a slowly changing table

Posted: **Tue Aug 26, 2008 8:00 am**

Define 'only duplicate data'.

Posted: **Tue Aug 26, 2008 4:18 pm**

One option is to use the Create Key Change Column option in the Sort stage. Per the Help:

"This column is set to 1 for the first row in each group where the value of the sort key changes. Subsequent records in the group have the column set to 0."

Now, this gives you a way to identify the KEYS that are duplicated, but not all instances of the duplicates. We have a job that sorts the data with this option in place followd by a filter that splits the out based on the change field. That ends up in 2 datasets. Then we inner join the data back together again to capture all instances of the duplicated data. A left outer join will result in a list of non-duplicated data.

Brad.

Posted: **Tue Aug 26, 2008 5:20 pm**

Refer Also

viewtopic.php?t=121387&highlight=

Posted: **Tue Aug 26, 2008 5:21 pm**

Refer Also

viewtopic.php?t=121387&highlight=

DSXchange

How can we load only duplicate data to the target table

How can we load only duplicate data to the target table