Remove duplicate issue
Posted: Tue Aug 07, 2012 11:31 pm
We are having problem with remove duplicate stage.We are getting different outpu set for different run of the same job.Our requirement is to sort the source data on 3 different fields(A,B,C).Then preserving the sort order we need to remove duplicate on field D.
We tried it like:
Design 1.
Source -Copy stage (partition on D) -Sort stage(sort by A,B,C With SAME partition)-Remove duplicate stage(Key field D,SAME partition)
Design 2.
Source -Sort stage(sort by A,B,C With partition on D)-Remove duplicate stage(Key field D,SAME partition)
In both cases we are getting different output set from multiple run of the job.
Please suggest if anyone has faced similar issue before.
Thanks
Abhik.
We tried it like:
Design 1.
Source -Copy stage (partition on D) -Sort stage(sort by A,B,C With SAME partition)-Remove duplicate stage(Key field D,SAME partition)
Design 2.
Source -Sort stage(sort by A,B,C With partition on D)-Remove duplicate stage(Key field D,SAME partition)
In both cases we are getting different output set from multiple run of the job.
Please suggest if anyone has faced similar issue before.
Thanks
Abhik.