removing duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pradeep_nov18
Participant
Posts: 92
Joined: Wed Mar 05, 2008 4:09 am
Location: chennai
Contact:

removing duplicates

Post by pradeep_nov18 »

suppose if the data is repeated
i wil illustrate with an ex
A b
1 q
1 r
1 k
1 k
1 k
suppose they are 10,000 i don't no wat is ronum of that duplicate location i found that 1,k which is in the center is duplicate since it is denormalised data how to remove that in transformer and also please send me list of frequently used constraint in dtastage jobs in propogating right data to table or file and how to handle nulls remove leading spaces and also if possible send the transformation logic accordling the client requirement which u guys have faced in the projects please do the need ful since i need it very badly even i too some experience i wnt to match my calibre with u guys
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

You first need to Sort the data based on the key feilds and Either use RemoveDuplicate stage or Aggregator Stage or Sort or Sort coupled with Transformer Stage optionally with Stage Variables.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

You first need to Sort the data based on the key feilds and Either use RemoveDuplicate stage or Aggregator Stage or Sort or Sort coupled with Transformer Stage optionally with Stage Variables.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

As you asked again,
If you could check in Document, you have whole set of information.
There are 7 Null handling functions available in Transformer.
IsNotNull, IsNull, MakeNull, NullToEmpty, NullToZero, NullToValue, SetNull.
You can use Trim with its different attributes.
If you be more specific, you will get more specific answers.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At DSXchange we prefer a more professional standard of written English, the same as you would use when documenting your work. DSXchange is not a mobile telephone. People whose first language is not English have a hard enough time understanding well-written English - they experience added difficulty understanding things like "i don't no wat is ronum" or "u guys". I consider them to be abominations. Finally, there is a Burmese participant on the forum whose name is U - so please use the correct spelling of the second person personal pronoun, which is "you". Take care with spelling of other words too, please.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a Remove Duplicates stage based on both columns. Ensure that data are partitioned correctly (on both columns).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply