Ignore duplicated rows

xcasals · Post by **xcasals** » Wed Jun 16, 2004 5:37 am

Hi all,

I'm trying to ignore the duplicated rows from an input. I mean, this is the input:

AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
XXX,YYY,ZZZ
XXX,YYY,ZZZ
XXX,YYY,ZZZ

and this is the output i want to get

AAAA,BBBB,CCCC
XXX,YYY,ZZZ

seems easy, but am a newbie.

thanks.

denzilsyb · Post by **denzilsyb** » Wed Jun 16, 2004 6:06 am

You could write the input into a HASH stage and make the columns key columns, that will get rid of the duplicate records.

dnzl

vigneshra · Post by **vigneshra** » Wed Jun 16, 2004 6:08 am

xcasals wrote:Hi all,

I'm trying to ignore the duplicated rows from an input. I mean, this is the input:

AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
AAAA,BBBB,CCCC
XXX,YYY,ZZZ
XXX,YYY,ZZZ
XXX,YYY,ZZZ

and this is the output i want to get

AAAA,BBBB,CCCC
XXX,YYY,ZZZ

seems easy, but am a newbie.

thanks.

Based on some key you need to sort the data which can be done through a sorter stage. Then the sorted data is fed to a duplicates remover stage which gives you the required output. Since I am new to DataStage, any experts please correct me if I am wrong

degraciavg · Post by **degraciavg** » Wed Jun 16, 2004 8:46 am

xcasals wrote: I'm trying to ignore the duplicated rows from an input.

If your input data is from a relational database, you can actually add the DISTINCT clause in your DML (i.e. SELECT query). You remove unnecessary steps this way.

KeithM · Post by **KeithM** » Wed Jun 16, 2004 9:48 am

If your input is an ODBC stage, rather than changing the sql to be user defined in order to specify the 'Distinct' keyword, you could just go to the columns tab and group by all of your columns. This will have the same effect as the distinct and give you the results that you want.

jseclen · Post by **jseclen** » Wed Jun 16, 2004 11:54 am

Hi,

If your input is a sequential file you can use stages variables.

Define in the transformer stage:

Dupli = If (LastField = ActualField) Then 1 Else 0
LastField = If (LastField <> ActualField) Then ActualField Else LastField

In the constraint you define the next condition

Dupli = 0

In the output file you will have the desired records.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 16, 2004 4:36 pm

vigneshra wrote: Based on some key you need to sort the data which can be done through a sorter stage. Then the sorted data is fed to a duplicates remover stage which gives you the required output. Since I am new to DataStage, any experts please correct me if I am wrong

The Remove Duplicates stage type is not available for server jobs; it is only available for parallel jobs.

DSXchange

Ignore duplicated rows

Ignore duplicated rows

Re: Ignore duplicated rows

Re: Ignore duplicated rows

Re: Ignore duplicated rows

Re: Ignore duplicated rows