Hi all,
I'm trying to ignore the duplicated rows from an input. I mean, this is the input:
AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
XXX,YYY,ZZZ
XXX,YYY,ZZZ
XXX,YYY,ZZZ
and this is the output i want to get
AAAA,BBBB,CCCC
XXX,YYY,ZZZ
seems easy, but am a newbie.
thanks.
Ignore duplicated rows
Moderators: chulett, rschirm, roy
Re: Ignore duplicated rows
xcasals wrote:Hi all,
I'm trying to ignore the duplicated rows from an input. I mean, this is the input:
AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
XXX,YYY,ZZZ
XXX,YYY,ZZZ
XXX,YYY,ZZZ
and this is the output i want to get
AAAA,BBBB,CCCC
XXX,YYY,ZZZ
seems easy, but am a newbie.
thanks.
Based on some key you need to sort the data which can be done through a sorter stage. Then the sorted data is fed to a duplicates remover stage which gives you the required output. Since I am new to DataStage, any experts please correct me if I am wrong
-
- Premium Member
- Posts: 39
- Joined: Tue May 20, 2003 3:36 am
- Location: Singapore
Re: Ignore duplicated rows
If your input data is from a relational database, you can actually add the DISTINCT clause in your DML (i.e. SELECT query). You remove unnecessary steps this way.xcasals wrote: I'm trying to ignore the duplicated rows from an input.
regards,
vladimir
vladimir
-
- Participant
- Posts: 133
- Joined: Wed Mar 05, 2003 4:19 pm
- Location: Lima - Peru. Sudamerica
- Contact:
Re: Ignore duplicated rows
Hi,
If your input is a sequential file you can use stages variables.
Define in the transformer stage:
Dupli = If (LastField = ActualField) Then 1 Else 0
LastField = If (LastField <> ActualField) Then ActualField Else LastField
In the constraint you define the next condition
Dupli = 0
In the output file you will have the desired records.
If your input is a sequential file you can use stages variables.
Define in the transformer stage:
Dupli = If (LastField = ActualField) Then 1 Else 0
LastField = If (LastField <> ActualField) Then ActualField Else LastField
In the constraint you define the next condition
Dupli = 0
In the output file you will have the desired records.
Saludos,
Miguel Seclén
Lima - Peru
Miguel Seclén
Lima - Peru
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Ignore duplicated rows
The Remove Duplicates stage type is not available for server jobs; it is only available for parallel jobs.vigneshra wrote: Based on some key you need to sort the data which can be done through a sorter stage. Then the sorted data is fed to a duplicates remover stage which gives you the required output. Since I am new to DataStage, any experts please correct me if I am wrong
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.