Partitioning Problem
Moderators: chulett, rschirm, roy
Partitioning Problem
Due to the partitioning very often I am facing the problem where say max value or say Funnel sequencing option works on the partition, but when I see the result I find them not correct value or the sequence (in the sense they are true for the partition but not for the whole data).
How do I overcome that. Is Forcing the stage to run in sequential mode the only solution?
Please advise.
How do I overcome that. Is Forcing the stage to run in sequential mode the only solution?
Please advise.
If you want one record per the whole input file/Data yes you can force it o sequntial mode.
But if you are looking per key, then the partition which was made earliar is incorrect.
The key based partition should be made based on the key on which you are doing the operations such as Aggregation or RemDuplicate.
But if you are looking per key, then the partition which was made earliar is incorrect.
The key based partition should be made based on the key on which you are doing the operations such as Aggregation or RemDuplicate.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Code: Select all
ParallelStuff ----> [Collector] Aggregator ----> Target
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 21
- Joined: Sat Sep 08, 2007 12:04 am
- Location: bangalore
If you're aggregating only a small data size then using the Aggregator in Sequential mode is going to be fine.
If you have a lot of data, use an aggregator that runs in parallel mode, then to achieve the desired result feed this output into another aggregator that is running in sequential mode i.e.
Partitionned Data -> Aggregator(Parallel - Gives aggregations per node, large number of records) -> Aggregator(Sequential - aggregates across the nodes, small number of records)
If you have a lot of data, use an aggregator that runs in parallel mode, then to achieve the desired result feed this output into another aggregator that is running in sequential mode i.e.
Partitionned Data -> Aggregator(Parallel - Gives aggregations per node, large number of records) -> Aggregator(Sequential - aggregates across the nodes, small number of records)