Page 1 of 1

Handling Duplicate Data

Posted: Wed Jan 12, 2011 2:22 pm
by dsadm_ws
Hi All,

I am trying to handle Duplicates with Sort Stage using Create Key Change Column = True.

However, I have some issues:
1) Say I have 103 Columns out of which 6 are duplicate, and I want to Capture these 6 records.

Instead, with Sort stage, it retains the first column and discards the other.

2) Also, I want to retain only those records with the following condition:

a. I have 7 columns out of which 2 are key columns and 5 are value columns.

Emp_id Dept_id Qtr YTD One_YY Three_YY Five_YY Ten_YY
12 20 123.44 -99.00 -99.00 -99.00 125.55 -99.00
12 20 -99.00 -99.00 145.55 567.88 012.45 -99.00

The above is some sample duplicates data.

Requirement is to retain only the record with less number of -99.00 values and discard the others. How can I attain this.
Any Ideas!!

Regards.

Re: Handling Duplicate Data

Posted: Wed Jan 12, 2011 3:10 pm
by jwiles
Use a transformer to count the number of -99.00 columns and add that count as an output column, then sort using the count as the last sort key column (don't partition on it tho). If you sort in ascending order your desired record will be the first within a key group.

Regards,

Posted: Thu Jan 13, 2011 10:22 am
by dsadm_ws
Thank You. I usd the stage variables to get the count and Used the sort Stage to extract the least count value.

I really appreciate your help.