Hi All,
My job requirement is to check if job activities of employees are proper.
For example, His/her first task should be a particular job code and not 'OUT or Lunch'.
The last task is always 'OUT'. And lot more conditions.
To implement this
After reading from the file in a parallel job, if I sort on the key columns(employee number,Business date,activities timestamp) using explicit sort stage and then propagate the data to other stages , Is it guaranteed that sorting will be preserved across multiple stages?
I am applying these conditions in a transformer after some stages. In all middle stages, sort is preserved and partitioning is left to default(propagate). My whole logic will depend on the sorted data and hence I want to make sure if this is guranteed to work.
One more doubt is .. I can implement the job in 2 ways.
First, Copy the whole data (1million) to 3 output links from a transformer and then apply some conditions on one link, some other on 2nd link and remaining on 3rd link and funnel the data to output.
Otherwise I can also handle all conditions in a single tranformer instead of routing all data to 3 output links but the conditions will become a bit complex. Will we have any significant improvement in performance(time) between both the above cases?
Please let me know your inputs.
Regards,
Ssunda.
Is Sorting preserved across multiple stages in parallel jobs
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Sorting is guaranteed to be preserved unless you repartition the data.
Your first option would require a Join stage rather than a Funnel stage, otherwise you'll get three copies of each source row. Your second approach (all in one Transformer) may be quite efficient - make your derivation expressions as efficient as possible.
Your first option would require a Join stage rather than a Funnel stage, otherwise you'll get three copies of each source row. Your second approach (all in one Transformer) may be quite efficient - make your derivation expressions as efficient as possible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: