Handling Duplicate Data

dsadm_ws · Post by **dsadm_ws** » Wed Jan 12, 2011 2:22 pm

Hi All,

I am trying to handle Duplicates with Sort Stage using Create Key Change Column = True.

However, I have some issues:
1) Say I have 103 Columns out of which 6 are duplicate, and I want to Capture these 6 records.

Instead, with Sort stage, it retains the first column and discards the other.

2) Also, I want to retain only those records with the following condition:

a. I have 7 columns out of which 2 are key columns and 5 are value columns.

Emp_id Dept_id Qtr YTD One_YY Three_YY Five_YY Ten_YY
12 20 123.44 -99.00 -99.00 -99.00 125.55 -99.00
12 20 -99.00 -99.00 145.55 567.88 012.45 -99.00

The above is some sample duplicates data.

Requirement is to retain only the record with less number of -99.00 values and discard the others. How can I attain this.
Any Ideas!!

Regards.

jwiles · Post by **jwiles** » Wed Jan 12, 2011 3:10 pm

Use a transformer to count the number of -99.00 columns and add that count as an output column, then sort using the count as the last sort key column (don't partition on it tho). If you sort in ascending order your desired record will be the first within a key group.

Regards,

dsadm_ws · Post by **dsadm_ws** » Thu Jan 13, 2011 10:22 am

Thank You. I usd the stage variables to get the count and Used the sort Stage to extract the least count value.

I really appreciate your help.

DSXchange

Handling Duplicate Data

Handling Duplicate Data

Re: Handling Duplicate Data