Page 1 of 1

how we can remove duplicates in transformer stage

Posted: Thu Oct 18, 2007 2:59 am
by vamsipx
hi

Posted: Thu Oct 18, 2007 9:16 am
by ray.wurlod
There are no duplicates in a Transformer stage.

Re: how we can remove duplicates in transformer stage

Posted: Thu Oct 18, 2007 10:28 am
by vikasjawa
Hi,
You can perform remove duplicate functionality in a transformer stage.In the transformer Stage select Hash partitioning on the Key you want to perform remove duplicate. Then write a simple logic in the stage variables to check if the last row had the same key value as the current one and use this as a constraint.

Posted: Mon Feb 25, 2008 11:11 pm
by sri.cbv
ray.wurlod wrote:There are no duplicates in a Transformer stage. ...
hi
:lol: hi ,

You can remove the duplicates in transformer stage using stage variables.
define 3 variables called x ,y , z .assign x and z to zeo then use the if then else condition for B and compare and c .





Thanks
srinivas

Posted: Mon Feb 25, 2008 11:42 pm
by saikir
Hi,

You can remove the duplicates in the transformer using the routine RowProcCompareWithPreviousValue. Sort the ouput and pass the keycolumn as input to the routine. It returns zero if the previous row is same as the current zero.

However, this may be slower then other techniques used to find out duplicates.

Sai

Posted: Mon Feb 25, 2008 11:45 pm
by ray.wurlod
RowProcCompareWithPreviousValue in a parallel Transformer? Doubt it.

RowProcCompareWithPreviousValue is written in DataStage BASIC and relies on COMMON, which immediately invalidates it from use in a parallel job. See Chapter 2 of Parallel Job Developer's Guide for more information on the rules.

Posted: Mon Feb 25, 2008 11:55 pm
by saikir
Hi Ray,

Thanks for the correction. Just missed the part that it is parallel but not server.

One small clarification,the documentation states that there is a BASIC Transformer stage where in you can use Basic functions. Can i use the routine in this?

Sai

Posted: Tue Feb 26, 2008 12:22 am
by ray.wurlod
I expect, from reading the rules, that the fact that the routine uses COMMON variable would preclude it. Why not try it and let us know?

Re: how we can remove duplicates in transformer stage

Posted: Wed Mar 19, 2008 3:45 am
by harsha_blm
FYI..

Declare 2 stage variables in the transformer-

Constraint StageVariable
------------ --------------------
svCurr svPrev

Input.colname svCurr


Mention the constraint svCurr<>svPrev in the output

Posted: Wed Mar 19, 2008 5:50 am
by srimitta
Sort column / columns on which you want to identify duplicates
Call Transformation Routine RowProcCompareWithPreviousValue

Or

Create 3 Stage Vaibles, need to be in order.
StageVariable1 --> svCurr --> <Column Names>
StageVariable2 --> svUni -->If svPrev = svCurr Then 'D' Else 'U'
StageVariable3 --> svPrev --> <Column Names>

In SvCurr & svPrev you need to have SAME columns
if you want identify duplicate on more than one column, do concatinate all columns
svCurr --> COL1 : COL2 : COL3
svUni --> svUni -->If svPrev = svCurr Then 'D' Else 'U'
svPrev --> COL1 : COL2 : COL3

'D' = Duplicate
'U' = Unique

On your Constraint call svUni and define what you want in your out link.

Posted: Tue Mar 25, 2008 9:06 am
by abc123
On the transformer stage, on the Input tab, on the Partitioning tab, select Hash Partitioning method. Check Sort and Unique. It'll give you what you want.