Remove duplicates on morethan 1 column

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
arshi
Participant
Posts: 50
Joined: Wed Apr 18, 2007 5:12 am

Remove duplicates on morethan 1 column

Post by arshi »

Hi,

If i need to remove the duplicates on 1 column by using the stage variables by,
SvCurr=SvPre
link1.col1=SvCurr
If SvPre=SvCurr then @FALST else @TRUE

by using this logic I got the solution.

The requirement is to remove duplicates more than 1 column How to implient this remove duplicates by using stage variables.Source is in flat file .Any one have the solution share with me.

Regards,
Arshi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Please explain what "remove duplicates more than 1 column" means.
-craig

"You can never have too many knives" -- Logan Nine Fingers
satya99
Participant
Posts: 104
Joined: Thu Nov 30, 2006 1:22 pm

Post by satya99 »

You should look at record wise but not column wise

I mean compare 1st record with s2nd record..
satya
DS_SUPPORT
Premium Member
Premium Member
Posts: 232
Joined: Fri Aug 04, 2006 1:20 am
Location: Bangalore

Post by DS_SUPPORT »

Concatenate all the columns in the previous transformer, which you want to compare, and apply the same logic for that concatenated column.

Be aware, that this method requires sorted input.
Yoga_12
Participant
Posts: 18
Joined: Fri Feb 08, 2008 2:14 am
Location: Bangalore

Post by Yoga_12 »

Help me to understand your Requirement. I am assuming that this is how your Flat file is.

Empno monthsal
--------------------------
1 Jan
1 Feb
2 jan
2 feb
2 Mar

based on the Empno you want to remove the duplicates for that you are using the above logic. it will perfectly work. You don't need to do any concatenation only thing you have to do is give the sorted input to the Transformer before applying your logic. put a constraint on the transformer to load the data. but you will get only the First record as output.

Like

1 Jan
2 Jan
then use aggregator

Correct me if i am wrong
Yoga Kathirvelu
Working with interesting people is more interesting than just working
dineshrk
Participant
Posts: 46
Joined: Thu Jun 26, 2008 9:14 pm

Re: Remove duplicates on morethan 1 column

Post by dineshrk »

arshi wrote:Hi,

If i need to remove the duplicates on 1 column by using the stage variables by,
SvCurr=SvPre
link1.col1=SvCurr
If SvPre=SvCurr then @FALST else @TRUE

by using this logic I got the solution.

The requirement is to remove duplicates more than 1 column How to implient this remove duplicates by using stage variables.Source is in flat file .Any one have the solution share with me.

Regards,
Arshi



If your requirement is to compare a combination of columns with the next record to remoe duplictes. use the following function

RowProcCompareWithPreviousValue


Syntax is

RowProcCompareWithPreviousValue(Col1:Col2;Col3)


It will return a value 1 if the combination is same else 0. You can assign incrementing value to some variable initially and then you can remove it using constraint stating that variable with the fist value should nly e passed.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

chulett wrote:Please explain what "remove duplicates more than 1 column" means.
Until Arshi comes back and clarifies things, you are all just guessing. I don't know about you, but I don't like to have to guess what the problem is that we are trying to solve so I'm perfectly content to wait.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dineshrk
Participant
Posts: 46
Joined: Thu Jun 26, 2008 9:14 pm

Post by dineshrk »

chulett wrote:
chulett wrote:Please explain what "remove duplicates more than 1 column" means.
Until Arshi comes back and clarifies things, you are all just guessing. I don't know about you, but I don't like to have to guess what the problem is that we are trying to solve so I'm perfectly content to wait.


Ok then Lets wait... :)
nisaumande
Participant
Posts: 13
Joined: Fri Aug 11, 2006 11:57 am
Location: Toulouse, France

Post by nisaumande »

Apart from what is the problem of Arshi, the easiest way I found to remove key duplicates is to send all the lines in a Hash File with the right key definition.
The input sort order defines which lines are kept.

Nicolas
halpavan2
Participant
Posts: 31
Joined: Fri Apr 18, 2008 5:44 am
Location: Hyderabad

Post by halpavan2 »

I tried with this function
RowProcCompareWithPreviousValue(StageVar3:StageVar2:StageVar1)
with the following data.
Input data

A B C
-- ---------- ----------
1 1 1
1 2 3
2 2 2
2 3 4


I got the Output data

A B C
-- ---------- ----------
1 1 1

But i want the output data as
A B C
-- ---------- ----------
1 2 3
2 3 4

Explain clearly how this function works.
halpavan2
Participant
Posts: 31
Joined: Fri Apr 18, 2008 5:44 am
Location: Hyderabad

Post by halpavan2 »

I tried with this function
RowProcCompareWithPreviousValue(StageVar3:StageVar2:StageVar1)
with the following data.
Input data

A B C
-- ---------- ----------
1 1 1
1 2 3
2 2 2
2 3 4


I got the Output data

A B C
-- ---------- ----------
1 1 1

But i want the output data as
A B C
-- ---------- ----------
1 2 3
2 3 4

Explain clearly how this function works.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Double-click on the function itself in the Manager (Routines\sdk\RowProc) - it is "explained clearly" in the Long Description, as are all supplied routines.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dineshrk
Participant
Posts: 46
Joined: Thu Jun 26, 2008 9:14 pm

Post by dineshrk »

chulett wrote:Double-click on the function itself in the Manager (Routines\sdk\RowProc) - it is "explained clearly" in the Long Description, as are all supplied routines. ...

Actually it sends a boolean return if it is true or false.. 1 for true and 0 for false.What it will do is concatenate all the columns into one variable and compare with the next.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Actually, what is does is documented in the routine. Shocking, I know.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply