Some partition doubts

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ashish10mca
Participant
Posts: 9
Joined: Sat Oct 10, 2009 6:13 am

Some partition doubts

Post by ashish10mca »

Doubt 1: entire partition if uses and record spooled as it is in partition
then we get 4 times of source record(If config file is of 4 nodes).
right or wrong??

Doubt 2: Same if uses in first stage then wht would be the default partition.(Because same follows the partition strategy of upcoming stage and if it spplied in first stage the for which partition datastage would go for)??

Doubt 3: Differnce between round robin and random partition(Means random partition known to distribute records in random manner amongest nodes then how random partition manages load balencing)??
srinivas.g
Participant
Posts: 251
Joined: Mon Jun 09, 2008 5:52 am

Post by srinivas.g »

1. Yes. if reference link is having 100 records then each partition having 100 records.

2. same partition means it will take previous partition.
3. yes
Srinu Gadipudi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

3. It doesn't. It's random. But, for a large enough number of rows, random distribution will be close enough to 1/N rows per node.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

1. I think your question is, do the incoming rows in a stage with entire partitioning set, get quadrupled in a 4 node configuration? That is, all rows go into each output partition.

Answer is yes.
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

2. Selecting 'same' partition in source/output link will try to use the same partition in the source dataset created by a prior job - avoids repartitioning of data. Though 'same' can be used wtih source database stages, we need to be careful when data is read in parallel from a partitioned DB2 (and i guess in oracle also) table (for eg. using 'current node' clause).

Thanks,
Senthil
datastagesandeep
Participant
Posts: 1
Joined: Thu Feb 03, 2011 9:44 am

Re: Some partition doubts

Post by datastagesandeep »

ANSWER: Round Robin and Random are different itself in their distribution.
Lets take one example.
If My data is =(5,8,3,9,4,6,7,5,8,12,45,98,36,14)

Roundrobin for three nodes will be:
First: (5,9,7,12,36)
Second:(8,4,5,45,14)
Third:(3,6,8,98)

But Random could be: (this is one of the possible way)
First: (5,14,5,4,98)
Second:(8,12,7,45,9)
Third:(3,6,8,36)

Hope will be helpful
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Re: Some partition doubts

Post by PhilHibbs »

datastagesandeep wrote:But Random could be: (this is one of the possible way)
First: (5,14,5,4,98)
Second:(8,12,7,45,9)
Third:(3,6,8,36)
Sorry to reply to such an old post but this needs to be corrected. That is NOT possible - since 14 is the last input value, it can only ever be the last value out of whichever node it goes to. It can't turn up as the second value in a node, unless that node only received 2 values.

Unless there is also a sort happening on some other unseen value.
Phil Hibbs | Capgemini
Technical Consultant
Post Reply