Page 1 of 1

Remove duplicates on morethan 1 column

Posted: Fri Aug 01, 2008 6:24 am
by arshi
Hi,

If i need to remove the duplicates on 1 column by using the stage variables by,
SvCurr=SvPre
link1.col1=SvCurr
If SvPre=SvCurr then @FALST else @TRUE

by using this logic I got the solution.

The requirement is to remove duplicates more than 1 column How to implient this remove duplicates by using stage variables.Source is in flat file .Any one have the solution share with me.

Regards,
Arshi

Posted: Fri Aug 01, 2008 7:18 am
by chulett
Please explain what "remove duplicates more than 1 column" means.

Posted: Fri Aug 01, 2008 7:57 am
by satya99
You should look at record wise but not column wise

I mean compare 1st record with s2nd record..

Posted: Mon Aug 04, 2008 1:24 am
by DS_SUPPORT
Concatenate all the columns in the previous transformer, which you want to compare, and apply the same logic for that concatenated column.

Be aware, that this method requires sorted input.

Posted: Mon Aug 04, 2008 5:25 am
by Yoga_12
Help me to understand your Requirement. I am assuming that this is how your Flat file is.

Empno monthsal
--------------------------
1 Jan
1 Feb
2 jan
2 feb
2 Mar

based on the Empno you want to remove the duplicates for that you are using the above logic. it will perfectly work. You don't need to do any concatenation only thing you have to do is give the sorted input to the Transformer before applying your logic. put a constraint on the transformer to load the data. but you will get only the First record as output.

Like

1 Jan
2 Jan
then use aggregator

Correct me if i am wrong

Re: Remove duplicates on morethan 1 column

Posted: Tue Aug 05, 2008 8:59 am
by dineshrk
arshi wrote:Hi,

If i need to remove the duplicates on 1 column by using the stage variables by,
SvCurr=SvPre
link1.col1=SvCurr
If SvPre=SvCurr then @FALST else @TRUE

by using this logic I got the solution.

The requirement is to remove duplicates more than 1 column How to implient this remove duplicates by using stage variables.Source is in flat file .Any one have the solution share with me.

Regards,
Arshi



If your requirement is to compare a combination of columns with the next record to remoe duplictes. use the following function

RowProcCompareWithPreviousValue


Syntax is

RowProcCompareWithPreviousValue(Col1:Col2;Col3)


It will return a value 1 if the combination is same else 0. You can assign incrementing value to some variable initially and then you can remove it using constraint stating that variable with the fist value should nly e passed.

Posted: Tue Aug 05, 2008 9:48 am
by chulett
chulett wrote:Please explain what "remove duplicates more than 1 column" means.
Until Arshi comes back and clarifies things, you are all just guessing. I don't know about you, but I don't like to have to guess what the problem is that we are trying to solve so I'm perfectly content to wait.

Posted: Tue Aug 05, 2008 9:55 am
by dineshrk
chulett wrote:
chulett wrote:Please explain what "remove duplicates more than 1 column" means.
Until Arshi comes back and clarifies things, you are all just guessing. I don't know about you, but I don't like to have to guess what the problem is that we are trying to solve so I'm perfectly content to wait.


Ok then Lets wait... :)

Posted: Wed Aug 06, 2008 5:47 am
by nisaumande
Apart from what is the problem of Arshi, the easiest way I found to remove key duplicates is to send all the lines in a Hash File with the right key definition.
The input sort order defines which lines are kept.

Nicolas

Posted: Wed Aug 06, 2008 7:03 am
by halpavan2
I tried with this function
RowProcCompareWithPreviousValue(StageVar3:StageVar2:StageVar1)
with the following data.
Input data

A B C
-- ---------- ----------
1 1 1
1 2 3
2 2 2
2 3 4


I got the Output data

A B C
-- ---------- ----------
1 1 1

But i want the output data as
A B C
-- ---------- ----------
1 2 3
2 3 4

Explain clearly how this function works.

Posted: Wed Aug 06, 2008 7:06 am
by halpavan2
I tried with this function
RowProcCompareWithPreviousValue(StageVar3:StageVar2:StageVar1)
with the following data.
Input data

A B C
-- ---------- ----------
1 1 1
1 2 3
2 2 2
2 3 4


I got the Output data

A B C
-- ---------- ----------
1 1 1

But i want the output data as
A B C
-- ---------- ----------
1 2 3
2 3 4

Explain clearly how this function works.

Posted: Wed Aug 06, 2008 7:26 am
by chulett
Double-click on the function itself in the Manager (Routines\sdk\RowProc) - it is "explained clearly" in the Long Description, as are all supplied routines.

Posted: Wed Aug 06, 2008 11:39 am
by dineshrk
chulett wrote:Double-click on the function itself in the Manager (Routines\sdk\RowProc) - it is "explained clearly" in the Long Description, as are all supplied routines. ...

Actually it sends a boolean return if it is true or false.. 1 for true and 0 for false.What it will do is concatenate all the columns into one variable and compare with the next.

Posted: Wed Aug 06, 2008 11:57 am
by chulett
Actually, what is does is documented in the routine. Shocking, I know.