I have a requirement in which, CSV file is the source, I need to get unique records based on a key column and load in a target fixed width file.I have 2 approaches, which of the below works fine, in terms of logic and performance. can anyone help me to list the pros and cons of the below 2 approaches?
Approach 1:
In the seq file target stage, I have set the property to sort-merge collection method ( key column) , perform sort enabled with only unique poperty.
SEQ File stage -> TRANSFORMER -> SEQ file (Fixed witdth file )
here I use tranformer to implement few business rules like concatenation and hardcoding of constants.
Approach 2:
I do HASH partition on the KEY column in Remove Duplicate stage and Perform sort enabled in Input tab
SEQ File stage -> REMOVE DUPS ->TRANSFORMER -> SEQ file (Fixed width file )
To get unique records from a CSV file
Moderators: chulett, rschirm, roy