DataStage File Partition

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
s_avneet
Participant
Posts: 22
Joined: Wed Aug 31, 2016 8:28 am

DataStage File Partition

Post by s_avneet »

Hi All.

I am running a parallel job with the stages as below:

SeqFileIn -> Xfrm -> SeqFileOut

I want to preserve the order of records as per the source file. Our APT_Config_File is having two nodes.

is there a way i can maintain the order?
Avneet
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What about that processing needs two nodes, let alone a Parallel job? Either run it on a single node, or force the target stage to run sequentially or just use a dang Server job. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

While making your job run sequentially is probably the easiest and best solution, there are other things you can play with if you want.

You can probably find a way to add a column with a row # on it to the seq file where it is generated (if possible), before your job is called (simple C program, for example), or in your job itself. Then you can sort off that value and drop the column when you are done with it. If you want to add it in your job you can do it any number of ways, including messing with some appropriate function using the datastage variables @rownum*@partitionnum etc.

You can also study your data to see if it is already sorted by some means, and just re-sort off the same values at the end.

Most of those are just not worth the trouble most of the time .... its one of those cases of "just because you can do something"...
Post Reply