A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.
Moderators: chulett , rschirm , roy
pandeesh
Premium Member
Posts: 1399 Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU
Post
by pandeesh » Mon Oct 03, 2011 9:29 am
hi,
I am having the below data in my source sequential file:
i need to remove duplicates only using transformer stage
My target sequential file should contain below data:
Please help me to achieve this.
The job is whether server or parallel doesn't matter.
But i need to use only transformer for removing duplicates.
Thanks
pandeeswaran
chulett
Charter Member
Posts: 43085 Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO
Post
by chulett » Mon Oct 03, 2011 11:07 am
Server job with a hashed file that just stores the key or record number or whatever uniquely identifies a record. Do a lookup to the hashed file and if the key does not exist, write it to the hashed file and pass it out the output link. If you get a hit on the hashed file, do nothing, as in do not pass the record through nor update the hashed file.
Note that either the hashed file must not be cached or the cache must be "locked for updates". I prefer the former approach. Also note that the write to the hashed file must be in the same transformer that does the lookup to ensure the locks (if you take that approach) are handled appropriately.
-craig
"You can never have too many knives" -- Logan Nine Fingers
pandeesh
Premium Member
Posts: 1399 Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU
Post
by pandeesh » Mon Oct 03, 2011 11:25 am
Thanks craig!
is there any way to achieve the same using parallel job?
thanks
pandeeswaran
ray.wurlod
Participant
Posts: 54607 Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:
Post
by ray.wurlod » Mon Oct 03, 2011 12:46 pm
pandeesh wrote: But i need to use only transformer for removing duplicates.
Why?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Posts: 1399 Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU
Post
by pandeesh » Mon Oct 03, 2011 3:52 pm
Just I would like to know whether it's possible .
pandeeswaran
SURA
Premium Member
Posts: 1229 Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney
Post
by SURA » Mon Oct 03, 2011 5:12 pm
Server job is easy there no special work from our side, whereas in PX you need to sort it, use stages like remove dup, TFM etc.
DS User
ray.wurlod
Participant
Posts: 54607 Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:
Post
by ray.wurlod » Mon Oct 03, 2011 5:26 pm
pandeesh wrote: Just I would like to know whether it's possible .
The answer to that is "yes". But why?
The philosophy of parallel jobs is basically one task, one stage type. That's why there are so many more stage types than server jobs have.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Posts: 43085 Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO
Post
by chulett » Mon Oct 03, 2011 6:06 pm
I'm assuming this would be complicated on the PX side by the (apparent) need to retain the original input order... which leads me to think the dreaded "fork join" design would be appropriate in that case. Somehow.
-craig
"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607 Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:
Post
by ray.wurlod » Mon Oct 03, 2011 10:08 pm
Not at all. Set up a stable, unique sort on the input link to the Transformer stage and map the columns across the stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vishal_rastogi
Participant
Posts: 47 Joined: Thu Dec 09, 2010 4:37 am
Post
by vishal_rastogi » Tue Oct 04, 2011 7:50 am
hi
for parallel you can use the stage variables
var1=link1
var2=if var1 = var3 then 1 else 0
var3= link1
Vish
chulett
Charter Member
Posts: 43085 Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO
Post
by chulett » Tue Oct 04, 2011 7:54 am
In either version you can use stage variables... as long as the input is sorted in a usable fashion.
-craig
"You can never have too many knives" -- Logan Nine Fingers