seperating duplicate rows

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vinodhraj
Participant
Posts: 53
Joined: Mon Sep 12, 2005 6:51 am

seperating duplicate rows

Post by vinodhraj »

hi DS Gurus,

I want to seperate duplicate rows with following conditions,

1. seperate the rows which are duplicates Eg: if the values like this

2
2
2
4
4
5
5
6
7

then values should be thrown into rejection table in the follwoing manner

2
2
2
4
4
5
5

and the other table should have the 6 and 7 which have single counts.(routines should not be used)

2. duplicates should be seperated in which distinct rows should be seperated from the duplicate table. but rowcompprev-routine should not be used. any alternatives without using routines.

thanks

vinod
sonisa
Participant
Posts: 1
Joined: Mon Oct 18, 2004 12:42 am

Re: seperating duplicate rows

Post by sonisa »

Hi Vinodh,

Using the stage variable you can flag duplicate records based on the previous value then you can move into to separate files/tables.

Regards,
-Sanjay
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Re: seperating duplicate rows

Post by PhilHibbs »

sonisa wrote:Hi Vinodh,

Using the stage variable you can flag duplicate records based on the previous value then you can move into to separate files/tables.

Regards,
-Sanjay
How do you get it to throw the first occurrence into the reject link?

I would do this by first aggregating the data to get a count for each key, load the counts into a hash, then read the file again referring to the hash of counts.
Phil Hibbs | Capgemini
Technical Consultant
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Post by DeepakCorning »

Do a count on the respective field with th euse of a self join and then if the count is more than 1 then throw it in the error table and if it is 1 then make it flow it in target table.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

DeepakCorning wrote:Do a count on the respective field with th euse of a self join and then if the count is more than 1 then throw it in the error table and if it is 1 then make it flow it in target table.
Self-join? How do you do that in DataStage?
Phil Hibbs | Capgemini
Technical Consultant
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Post by DeepakCorning »

Many ways ...one can be hash file lookup of the source .
PhilHibbs wrote:
DeepakCorning wrote:Do a count on the respective field with th euse of a self join and then if the count is more than 1 then throw it in the error table and if it is 1 then make it flow it in target table.
Self-join? How do you do that in DataStage?
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Firstly, why do you want distinct?

If the source is from a db, you can use SQL join to get the dups.

Otherwise, write the count key in a ds and reference it in the job.

There are lots of other options depending on your requirement.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Self join? User-defined SQL or, if you're feeling ambitious, generated SQL with one alias in the Table Name field, aliases in the column derivations, and an appropriate WHERE clause with the other alias.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
singhald
Participant
Posts: 180
Joined: Tue Aug 23, 2005 2:50 am
Location: Bangalore
Contact:

Post by singhald »

Hi vinod
u can do this seperation by using StageVar in ur transformer and populate these records into other file as u want.


deepak
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ray.wurlod wrote:Self join? User-defined SQL or, if you're feeling ambitious, generated SQL with one alias in the Table Name field, aliases in the column derivations, and an appropriate WHERE clause with the other alias.
The OP never said that he had access to an SQL engine, he asked for a DataStage solution.
DeepakCorning wrote:
PhilHibbs wrote:Self-join? How do you do that in DataStage?
Many ways ...one can be hash file lookup of the source .
Like I suggested in my earlier post that you were answering? ;-)
Phil Hibbs | Capgemini
Technical Consultant
vinodhraj
Participant
Posts: 53
Joined: Mon Sep 12, 2005 6:51 am

Post by vinodhraj »

hi deepak,

u have told using stage variables, duplicate can be removed by the above conditions. can u please guide me how to proceed. i have visited forum, but i cant.

Is there any way to use count function in transformer?

thanks

vinod
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

vinodhraj wrote:hi deepak,
u have told using stage variables, duplicate can be removed by the above conditions. can u please guide me how to proceed. i have visited forum, but i cant.
Is there any way to use count function in transformer?
thanks
vinod
You can't remove the first occurrence using just a Transformer. Read the other replies for how to do this.
Phil Hibbs | Capgemini
Technical Consultant
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You will need to parse the file 2 times.
Post Reply