Handling Duplicate Data
Posted: Wed Jan 12, 2011 2:22 pm
Hi All,
I am trying to handle Duplicates with Sort Stage using Create Key Change Column = True.
However, I have some issues:
1) Say I have 103 Columns out of which 6 are duplicate, and I want to Capture these 6 records.
Instead, with Sort stage, it retains the first column and discards the other.
2) Also, I want to retain only those records with the following condition:
a. I have 7 columns out of which 2 are key columns and 5 are value columns.
Emp_id Dept_id Qtr YTD One_YY Three_YY Five_YY Ten_YY
12 20 123.44 -99.00 -99.00 -99.00 125.55 -99.00
12 20 -99.00 -99.00 145.55 567.88 012.45 -99.00
The above is some sample duplicates data.
Requirement is to retain only the record with less number of -99.00 values and discard the others. How can I attain this.
Any Ideas!!
Regards.
I am trying to handle Duplicates with Sort Stage using Create Key Change Column = True.
However, I have some issues:
1) Say I have 103 Columns out of which 6 are duplicate, and I want to Capture these 6 records.
Instead, with Sort stage, it retains the first column and discards the other.
2) Also, I want to retain only those records with the following condition:
a. I have 7 columns out of which 2 are key columns and 5 are value columns.
Emp_id Dept_id Qtr YTD One_YY Three_YY Five_YY Ten_YY
12 20 123.44 -99.00 -99.00 -99.00 125.55 -99.00
12 20 -99.00 -99.00 145.55 567.88 012.45 -99.00
The above is some sample duplicates data.
Requirement is to retain only the record with less number of -99.00 values and discard the others. How can I attain this.
Any Ideas!!
Regards.