problem with parallel job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

problem with parallel job

Post by kirankota79 »

Hi,

I have an input seq file just passing through the transformer without any changes to an output sequential file.
When i do this task with server job, i can see the output same as the input.
But when i do this task with parallel job, the output file is scrambled vesion of the input file, i mean the output file doesn't contain the data in the order that we have in the input file, they are getting shuffled, output file first contains even columns and then odd columns. I am not able to figure out the problem. Is it problem with compiler? Need help!
velagapudi_k
Premium Member
Premium Member
Posts: 142
Joined: Mon Jun 27, 2005 5:31 pm
Location: Atlanta GA

Post by velagapudi_k »

Are you saying the data is shuffled or the column order changed? If the data is shuffled, then it is totally expectable cause the data is distributed randomly among the number of nodes your datastage is using and then collected back at the output. If the column order is changed, then you have check your mapping.
Venkat Velagapudi
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

column order is not changed. only data is shuffled!
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

And that is expected in a parallel job. :wink:
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

This is due to the fact that the data is partitioned and executed on different nodes in parallel. One node could finish faster than the other and hence the shuffled behaviour. If you want the output to be in order, go to your output sequential file under 'Partitioning' tab, change the Collector Type to 'Sort Merge'. Choose the key on which the input is sorted on. That will make sure your data is intact on output.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply