I wanted to get a clarification regarding the SortMerge Collector method. Here is what I intend to do in my job : I want to combine/merge four sorted input files having the same column definition into one sorted output file i.e, the output file should be fully sorted. I am sure there are multiple methods to achieve this but here is what I am doing:
PX Job : Using a multiple node config file
Code: Select all
SeqFl Stage1 ---> SeqFl Stage 2
Properties (Tab) of 'SeqFl Stage1':
Source:
File=inputfile1
File=inputfile2
File=inputfile3
File=inputfile4
ReadMethod=Specific File(s)
Properties (Tab) of 'SeqFl Stage2':
Target:
File=sortedoutputfile
File Update Mode=Overwrite
Partitioning (Tab) of 'SeqFl Stage2':
Collector type: SortMerge
Keys: The same keys on which the input files are sorted on.
Can some one please comment/calrify on this design.