Peeking the Duplicate record alone

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Peeking the Duplicate record alone

Post by pandeesh »

I am having a dataset which contain 372 records. If i put remove duplicates stage , only 371 records are getting passed.It shws only one duplicate is there.
I want to peek that 1 duplicate recrd alone.
Whats the simplest way?

a)I have tried to load the data to sequential file and planned to use uniq -d.
But i dont have permission to create a sequential file.

b)Even if i peek all the records, it's very difficult to find the duplicate recrd.

C)Another way i can think is having 372 records(Orginal dataset), and having 371 recrds in another dataset(Created after removing duplicates).Then using Change capture, we can capture .

Is there any simplest way to find that duplicate record in datastage?

Thanks
pandeeswaran
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Try this One- Sort those 372 records based upon your key column and input them into a tranformer. Use stage variables there to find the duplicate record and take it to another output.
Thanx and Regards,
ETL User
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

I never used this approach. Could you please explain how to find duplicate using Stage Variable?
Thanks
pandeeswaran
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Peeking the Duplicate record alone

Post by SURA »

Use sort stage and choose the option Create Key Change Column = True and take it in the following TFM.

DS User
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Ok. Good.So i need to filter out the recoord whose key column is not changed in Transformer.For this, i guess no need of StageVariable.
Constraint is enough right?
I ll try this and let you know.
Thanks
pandeeswaran
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Yes, to use this method you do need to use stage variables, which will hold the key values from the previous record and set a "flag" to use in your constraint

Alternatively, you can generate a Key Change column in your sort stage and simply check the value of that column for 0 in your transformer, which will indicate the duplicate record (based on the sort key values).

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Thanks All!It worked!!
pandeeswaran
Post Reply