Is Sorting preserved across multiple stages in parallel jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ssunda6
Participant
Posts: 91
Joined: Tue Sep 19, 2006 9:32 pm

Is Sorting preserved across multiple stages in parallel jobs

Post by ssunda6 »

Hi All,

My job requirement is to check if job activities of employees are proper.
For example, His/her first task should be a particular job code and not 'OUT or Lunch'.
The last task is always 'OUT'. And lot more conditions.

To implement this
After reading from the file in a parallel job, if I sort on the key columns(employee number,Business date,activities timestamp) using explicit sort stage and then propagate the data to other stages , Is it guaranteed that sorting will be preserved across multiple stages?

I am applying these conditions in a transformer after some stages. In all middle stages, sort is preserved and partitioning is left to default(propagate). My whole logic will depend on the sorted data and hence I want to make sure if this is guranteed to work.

One more doubt is .. I can implement the job in 2 ways.
First, Copy the whole data (1million) to 3 output links from a transformer and then apply some conditions on one link, some other on 2nd link and remaining on 3rd link and funnel the data to output.
Otherwise I can also handle all conditions in a single tranformer instead of routing all data to 3 output links but the conditions will become a bit complex. Will we have any significant improvement in performance(time) between both the above cases?

Please let me know your inputs.

Regards,
Ssunda.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sorting is guaranteed to be preserved unless you repartition the data.

Your first option would require a Join stage rather than a Funnel stage, otherwise you'll get three copies of each source row. Your second approach (all in one Transformer) may be quite efficient - make your derivation expressions as efficient as possible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ssunda6
Participant
Posts: 91
Joined: Tue Sep 19, 2006 9:32 pm

Post by ssunda6 »

Hi Ray,

Thanks for the reply.
I was worried since it is a parallel job but your answer has cleared my doubt now.
And I forgot mentioning that when using 3 transformers and funnel, I am using a remove duplicates stage. So I will not be getting 3 copies of data.

Thanks again.
Ssunda.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You will (may) get multiple copies if your data are not partitioned as per the keys mentioned in the Remove Duplicates stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply