Ignore duplicated rows

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
xcasals
Participant
Posts: 12
Joined: Thu May 27, 2004 5:50 am

Ignore duplicated rows

Post by xcasals »

Hi all,

I'm trying to ignore the duplicated rows from an input. I mean, this is the input:

AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
XXX,YYY,ZZZ
XXX,YYY,ZZZ
XXX,YYY,ZZZ

and this is the output i want to get

AAAA,BBBB,CCCC
XXX,YYY,ZZZ


seems easy, but am a newbie.

thanks.
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Post by denzilsyb »

You could write the input into a HASH stage and make the columns key columns, that will get rid of the duplicate records.

dnzl
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Re: Ignore duplicated rows

Post by vigneshra »

xcasals wrote:Hi all,

I'm trying to ignore the duplicated rows from an input. I mean, this is the input:

AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
XXX,YYY,ZZZ
XXX,YYY,ZZZ
XXX,YYY,ZZZ

and this is the output i want to get

AAAA,BBBB,CCCC
XXX,YYY,ZZZ


seems easy, but am a newbie.

thanks.

Based on some key you need to sort the data which can be done through a sorter stage. Then the sorted data is fed to a duplicates remover stage which gives you the required output. Since I am new to DataStage, any experts please correct me if I am wrong :roll:
degraciavg
Premium Member
Premium Member
Posts: 39
Joined: Tue May 20, 2003 3:36 am
Location: Singapore

Re: Ignore duplicated rows

Post by degraciavg »

xcasals wrote: I'm trying to ignore the duplicated rows from an input.
If your input data is from a relational database, you can actually add the DISTINCT clause in your DML (i.e. SELECT query). You remove unnecessary steps this way.
regards,
vladimir
KeithM
Participant
Posts: 61
Joined: Thu Apr 22, 2004 11:34 am
Contact:

Post by KeithM »

If your input is an ODBC stage, rather than changing the sql to be user defined in order to specify the 'Distinct' keyword, you could just go to the columns tab and group by all of your columns. This will have the same effect as the distinct and give you the results that you want.
jseclen
Participant
Posts: 133
Joined: Wed Mar 05, 2003 4:19 pm
Location: Lima - Peru. Sudamerica
Contact:

Re: Ignore duplicated rows

Post by jseclen »

Hi,

If your input is a sequential file you can use stages variables.

Define in the transformer stage:

Dupli = If (LastField = ActualField) Then 1 Else 0
LastField = If (LastField <> ActualField) Then ActualField Else LastField

In the constraint you define the next condition

Dupli = 0

In the output file you will have the desired records.
Saludos,

Miguel Seclén
Lima - Peru
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Ignore duplicated rows

Post by ray.wurlod »

vigneshra wrote: Based on some key you need to sort the data which can be done through a sorter stage. Then the sorted data is fed to a duplicates remover stage which gives you the required output. Since I am new to DataStage, any experts please correct me if I am wrong :roll:
The Remove Duplicates stage type is not available for server jobs; it is only available for parallel jobs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply