DataStage File Partition

s_avneet · Post by **s_avneet** » Thu Dec 15, 2016 5:36 am

Hi All.

I am running a parallel job with the stages as below:

SeqFileIn -> Xfrm -> SeqFileOut

I want to preserve the order of records as per the source file. Our APT_Config_File is having two nodes.

is there a way i can maintain the order?

chulett · Post by **chulett** » Thu Dec 15, 2016 7:51 am

What about that processing needs two nodes, let alone a Parallel job? Either run it on a single node, or force the target stage to run sequentially or just use a dang Server job.

UCDI · Post by **UCDI** » Thu Dec 15, 2016 8:10 am

While making your job run sequentially is probably the easiest and best solution, there are other things you can play with if you want.

You can probably find a way to add a column with a row # on it to the seq file where it is generated (if possible), before your job is called (simple C program, for example), or in your job itself. Then you can sort off that value and drop the column when you are done with it. If you want to add it in your job you can do it any number of ways, including messing with some appropriate function using the datastage variables @rownum*@partitionnum etc.

You can also study your data to see if it is already sorted by some means, and just re-sort off the same values at the end.

Most of those are just not worth the trouble most of the time .... its one of those cases of "just because you can do something"...