seperating duplicate rows

vinodhraj · Post by **vinodhraj** » Mon Sep 12, 2005 7:29 am

hi DS Gurus,

I want to seperate duplicate rows with following conditions,

1. seperate the rows which are duplicates Eg: if the values like this

2
2
2
4
4
5
5
6
7

then values should be thrown into rejection table in the follwoing manner

2
2
2
4
4
5
5

and the other table should have the 6 and 7 which have single counts.(routines should not be used)

2. duplicates should be seperated in which distinct rows should be seperated from the duplicate table. but rowcompprev-routine should not be used. any alternatives without using routines.

thanks

vinod

sonisa · Post by **sonisa** » Mon Sep 12, 2005 7:45 am

Hi Vinodh,

Using the stage variable you can flag duplicate records based on the previous value then you can move into to separate files/tables.

Regards,
-Sanjay

PhilHibbs · Post by **PhilHibbs** » Mon Sep 12, 2005 8:46 am

sonisa wrote:Hi Vinodh,

Using the stage variable you can flag duplicate records based on the previous value then you can move into to separate files/tables.

Regards,
-Sanjay

How do you get it to throw the first occurrence into the reject link?

I would do this by first aggregating the data to get a count for each key, load the counts into a hash, then read the file again referring to the hash of counts.

DeepakCorning · Post by **DeepakCorning** » Mon Sep 12, 2005 8:56 am

Do a count on the respective field with th euse of a self join and then if the count is more than 1 then throw it in the error table and if it is 1 then make it flow it in target table.

PhilHibbs · Post by **PhilHibbs** » Mon Sep 12, 2005 9:29 am

DeepakCorning wrote:Do a count on the respective field with th euse of a self join and then if the count is more than 1 then throw it in the error table and if it is 1 then make it flow it in target table.

Self-join? How do you do that in DataStage?

DeepakCorning · Post by **DeepakCorning** » Mon Sep 12, 2005 9:34 am

Many ways ...one can be hash file lookup of the source .

PhilHibbs wrote:
DeepakCorning wrote:Do a count on the respective field with th euse of a self join and then if the count is more than 1 then throw it in the error table and if it is 1 then make it flow it in target table.
Self-join? How do you do that in DataStage?

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Sep 12, 2005 9:41 am

Firstly, why do you want distinct?

If the source is from a db, you can use SQL join to get the dups.

Otherwise, write the count key in a ds and reference it in the job.

There are lots of other options depending on your requirement.

ray.wurlod · Post by **ray.wurlod** » Mon Sep 12, 2005 3:56 pm

Self join? User-defined SQL or, if you're feeling ambitious, generated SQL with one alias in the Table Name field, aliases in the column derivations, and an appropriate WHERE clause with the other alias.

singhald · Post by **singhald** » Mon Sep 12, 2005 4:20 pm

Hi vinod
u can do this seperation by using StageVar in ur transformer and populate these records into other file as u want.

deepak

PhilHibbs · Post by **PhilHibbs** » Tue Sep 13, 2005 2:46 am

ray.wurlod wrote:Self join? User-defined SQL or, if you're feeling ambitious, generated SQL with one alias in the Table Name field, aliases in the column derivations, and an appropriate WHERE clause with the other alias.

The OP never said that he had access to an SQL engine, he asked for a DataStage solution.

DeepakCorning wrote:
PhilHibbs wrote:Self-join? How do you do that in DataStage?
Many ways ...one can be hash file lookup of the source .

Like I suggested in my earlier post that you were answering?

vinodhraj · Post by **vinodhraj** » Tue Sep 13, 2005 5:36 am

hi deepak,

u have told using stage variables, duplicate can be removed by the above conditions. can u please guide me how to proceed. i have visited forum, but i cant.

Is there any way to use count function in transformer?

thanks

vinod

PhilHibbs · Post by **PhilHibbs** » Tue Sep 13, 2005 5:53 am

vinodhraj wrote:hi deepak,
u have told using stage variables, duplicate can be removed by the above conditions. can u please guide me how to proceed. i have visited forum, but i cant.
Is there any way to use count function in transformer?
thanks
vinod

You can't remove the first occurrence using just a Transformer. Read the other replies for how to do this.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Sep 13, 2005 7:19 am

You will need to parse the file 2 times.

DSXchange

seperating duplicate rows

seperating duplicate rows

Re: seperating duplicate rows

Re: seperating duplicate rows