how we can remove duplicates in transformer stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: how we can remove duplicates in transformer stage
Hi,
You can perform remove duplicate functionality in a transformer stage.In the transformer Stage select Hash partitioning on the Key you want to perform remove duplicate. Then write a simple logic in the stage variables to check if the last row had the same key value as the current one and use this as a constraint.
You can perform remove duplicate functionality in a transformer stage.In the transformer Stage select Hash partitioning on the Key you want to perform remove duplicate. Then write a simple logic in the stage variables to check if the last row had the same key value as the current one and use this as a constraint.
Vikas Jawa
Hi,
You can remove the duplicates in the transformer using the routine RowProcCompareWithPreviousValue. Sort the ouput and pass the keycolumn as input to the routine. It returns zero if the previous row is same as the current zero.
However, this may be slower then other techniques used to find out duplicates.
Sai
You can remove the duplicates in the transformer using the routine RowProcCompareWithPreviousValue. Sort the ouput and pass the keycolumn as input to the routine. It returns zero if the previous row is same as the current zero.
However, this may be slower then other techniques used to find out duplicates.
Sai
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
RowProcCompareWithPreviousValue in a parallel Transformer? Doubt it.
RowProcCompareWithPreviousValue is written in DataStage BASIC and relies on COMMON, which immediately invalidates it from use in a parallel job. See Chapter 2 of Parallel Job Developer's Guide for more information on the rules.
RowProcCompareWithPreviousValue is written in DataStage BASIC and relies on COMMON, which immediately invalidates it from use in a parallel job. See Chapter 2 of Parallel Job Developer's Guide for more information on the rules.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 10
- Joined: Tue Jun 19, 2007 1:16 am
- Location: Bangalore
Re: how we can remove duplicates in transformer stage
FYI..
Declare 2 stage variables in the transformer-
Constraint StageVariable
------------ --------------------
svCurr svPrev
Input.colname svCurr
Mention the constraint svCurr<>svPrev in the output
Declare 2 stage variables in the transformer-
Constraint StageVariable
------------ --------------------
svCurr svPrev
Input.colname svCurr
Mention the constraint svCurr<>svPrev in the output
Sort column / columns on which you want to identify duplicates
Call Transformation Routine RowProcCompareWithPreviousValue
Or
Create 3 Stage Vaibles, need to be in order.
StageVariable1 --> svCurr --> <Column Names>
StageVariable2 --> svUni -->If svPrev = svCurr Then 'D' Else 'U'
StageVariable3 --> svPrev --> <Column Names>
In SvCurr & svPrev you need to have SAME columns
if you want identify duplicate on more than one column, do concatinate all columns
svCurr --> COL1 : COL2 : COL3
svUni --> svUni -->If svPrev = svCurr Then 'D' Else 'U'
svPrev --> COL1 : COL2 : COL3
'D' = Duplicate
'U' = Unique
On your Constraint call svUni and define what you want in your out link.
Call Transformation Routine RowProcCompareWithPreviousValue
Or
Create 3 Stage Vaibles, need to be in order.
StageVariable1 --> svCurr --> <Column Names>
StageVariable2 --> svUni -->If svPrev = svCurr Then 'D' Else 'U'
StageVariable3 --> svPrev --> <Column Names>
In SvCurr & svPrev you need to have SAME columns
if you want identify duplicate on more than one column, do concatinate all columns
svCurr --> COL1 : COL2 : COL3
svUni --> svUni -->If svPrev = svCurr Then 'D' Else 'U'
svPrev --> COL1 : COL2 : COL3
'D' = Duplicate
'U' = Unique
On your Constraint call svUni and define what you want in your out link.
Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives.
By William A.Foster
By William A.Foster