Page 1 of 1

Partitioning Problem

Posted: Fri Apr 25, 2008 11:47 am
by mydsworld
Due to the partitioning very often I am facing the problem where say max value or say Funnel sequencing option works on the partition, but when I see the result I find them not correct value or the sequence (in the sense they are true for the partition but not for the whole data).

How do I overcome that. Is Forcing the stage to run in sequential mode the only solution?

Please advise.

Posted: Fri Apr 25, 2008 12:04 pm
by kumar_s
If you want one record per the whole input file/Data yes you can force it o sequntial mode.
But if you are looking per key, then the partition which was made earliar is incorrect.
The key based partition should be made based on the key on which you are doing the operations such as Aggregation or RemDuplicate.

Posted: Fri Apr 25, 2008 3:07 pm
by ray.wurlod
You could partition the single-row maximum value using Entire partitioning algorithm. That way it would be the same on every node.

Posted: Wed Apr 30, 2008 9:41 pm
by abc123
Ray, wouldn't that be a lot of extra load on resources, that is, sending all rows through every node?

Posted: Wed Apr 30, 2008 10:28 pm
by ray.wurlod
Of course. But your original request sought "any other solution".

Posted: Thu May 01, 2008 8:17 am
by abc123
So what would be an ideal solution, in say, doing a max using an aggregator in a multi-node scenario? Let's say you have 3 nodes. 100 rows each are going through each node. How would get a proper max value among all nodes? Or does Datastage handle it automatically, which I think it does.

Posted: Thu May 01, 2008 9:51 am
by mydsworld
Good question.

But I dbout whether DS does it automatically or not.

Any thoughts.

Posted: Thu May 01, 2008 2:41 pm
by ray.wurlod
If you have an Aggregator after collection to sequential mode you will be able to derive the maximum value from all partitions on its output.

Posted: Thu May 01, 2008 10:13 pm
by mydsworld
Even if you have sequential mode input like

Seq File -> Aggregator-> ...

then also, aggregator will run on 3 nodes and find the aggregation for each partition.

Ray, I am not sure whether I got your point correctly.

Posted: Thu May 01, 2008 10:59 pm
by ray.wurlod

Code: Select all

ParallelStuff  ----> [Collector] Aggregator  ----> Target
Set the Aggregator stage to run in sequential mode and to use Sort/Merge collector.

Posted: Wed May 07, 2008 3:02 am
by shiva_reddys447
take a stage variable in transformer lets say Cnt

Intial value of Cnt=0.

put the below derivation for the sequence generating column.

If Cnt=0 then @PARTITIONNUM+1 Else Cnt+@NUMPARTITIONS

Posted: Wed May 07, 2008 4:32 am
by OddJob
If you're aggregating only a small data size then using the Aggregator in Sequential mode is going to be fine.

If you have a lot of data, use an aggregator that runs in parallel mode, then to achieve the desired result feed this output into another aggregator that is running in sequential mode i.e.

Partitionned Data -> Aggregator(Parallel - Gives aggregations per node, large number of records) -> Aggregator(Sequential - aggregates across the nodes, small number of records)