Page 1 of 1

duplicate and non duplicate data

Posted: Fri Sep 10, 2010 1:34 pm
by agpt
Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?

Re: duplicate and non duplicate data

Posted: Fri Sep 10, 2010 2:03 pm
by gateleys
agpt wrote:Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?
1. Sort values in the input column
2. Pass the output of step 1 into a transformer wherein you create 2 stage variables with the following derivations-

Code: Select all

svDuplicates = If RowProcCompareWithPreviousValue(InputLink.Column) Then @TRUE Else @FALSE

svNonDuplicates = If RowProcCompareWithPreviousValue(InputLink.Column) Then @FALSE Else @TRUE
3. Use 2 output links, one to pass duplicate rows and other for non-duplicates, using the constraints-
for dulicates-

Code: Select all

svDuplicates
for non-dulicates-

Code: Select all

svNonDuplicates
I hope it works in a parallel job...and it should IF you use a BASIC transofrmer.

Re: duplicate and non duplicate data

Posted: Fri Sep 10, 2010 2:09 pm
by anbu
agpt wrote:Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?
Do you have 8 records or 8 columns in a row?

Re: duplicate and non duplicate data

Posted: Sat Sep 11, 2010 12:28 am
by agpt
anbu wrote:
agpt wrote:Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?
Do you have 8 records or 8 columns in a row?
8 records

Posted: Sun Sep 12, 2010 2:41 am
by agpt
I went through the other posts in the forum... and got the solution - using copy, aggregator , filter then join back to get duplicates out....

Thanks to all of you!!!!