Design DataStage job

epall · Post by **epall** » Wed Aug 08, 2007 1:58 am

Hi,

What is the baseline to implement partition method in datastage job e.g. process more than 2 million records ?

Thanks.

Maveric · Post by **Maveric** » Wed Aug 08, 2007 3:06 am

Functionality. Using stages like Join, Lookup etc, the reference data and main link data should be in the same node to get the required output. consider
1,asd
1,asd
2,erf
3,saw
1,asd

Now if u use remove duplicates stage, if the first record and lst record are on node 1 and the second record is on node 2. u will still get 2 records in the output. If you hash partition the data on both the fields then 1st, 2nd and lst record will be on the same node and the output will be one record.

chulett · Post by **chulett** » Wed Aug 08, 2007 6:34 am

epall, you've marked the Job Type as TX in your post. If indeed this is a TX question, I'd suggest you post in the actual TX forum where those experts hang out.