Hi,
What is the baseline to implement partition method in datastage job e.g. process more than 2 million records ?
Thanks.
Design DataStage job
Moderators: chulett, rschirm, roy
Functionality. Using stages like Join, Lookup etc, the reference data and main link data should be in the same node to get the required output. consider
1,asd
1,asd
2,erf
3,saw
1,asd
Now if u use remove duplicates stage, if the first record and lst record are on node 1 and the second record is on node 2. u will still get 2 records in the output. If you hash partition the data on both the fields then 1st, 2nd and lst record will be on the same node and the output will be one record.
1,asd
1,asd
2,erf
3,saw
1,asd
Now if u use remove duplicates stage, if the first record and lst record are on node 1 and the second record is on node 2. u will still get 2 records in the output. If you hash partition the data on both the fields then 1st, 2nd and lst record will be on the same node and the output will be one record.