How can we load only duplicate data to the target table

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DWH-M
Premium Member
Premium Member
Posts: 46
Joined: Thu Sep 06, 2007 5:26 am

How can we load only duplicate data to the target table

Post by DWH-M »

How can we load only duplicate data using data stage parallel job,

how can we identify by seeing the table, it is a slowly changing table
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Define 'only duplicate data'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

One option is to use the Create Key Change Column option in the Sort stage. Per the Help:

"This column is set to 1 for the first row in each group where the value of the sort key changes. Subsequent records in the group have the column set to 0."

Now, this gives you a way to identify the KEYS that are duplicated, but not all instances of the duplicates. We have a job that sorts the data with this option in place followd by a filter that splits the out based on the change field. That ends up in 2 datasets. Then we inner join the data back together again to capture all instances of the duplicated data. A left outer join will result in a list of non-duplicated data.

Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
Raamc
Premium Member
Premium Member
Posts: 87
Joined: Mon Aug 20, 2007 9:08 am

Post by Raamc »

Thanks,
Raamc
Raamc
Premium Member
Premium Member
Posts: 87
Joined: Mon Aug 20, 2007 9:08 am

Post by Raamc »

Thanks,
Raamc
Post Reply