Partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
samba
Premium Member
Premium Member
Posts: 62
Joined: Wed Dec 07, 2005 11:44 am

Partitioning

Post by samba »

For Each job, is there any need of partitioning

suppose i have job like this


source ------(partitioning)------> trasnfromation ---(partitioning)---->target

we need to do as i mentioned above or better to keep Auto partitioning
which is the best pratice



Thanks
samba
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use (Auto) here. You only need explicit partitioning if rows need to be grouped (for example by key for join/lookup).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Auto will do, unless you do a key based transformation (eg comparing current and previous records...) in transformer.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

Hi,
Do we always have to key partition data before a lookup/join as Ray suggested or there any special case when we do this as my join/look up working fine without this, or i'm missing something ? :?:
Om Prakash


"There are things that are known, and there are things that are unknown, and in between there are doors"
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Auto partition might have helped you in your case. But dont expect in all the case, if your expect presicse result.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

You don't have to partition data before every join or lookup.

In the case of the join you have to ensure that the data with the same keys will be in the same partitions on each side of the join. This usually means partitioning by one or more of the join keys and sorting by all of them. However if the data is already partitioned in this way by a previous stage you can leave it be.

In the case of a lookup you only need partition on the keys if the reference data is not partitioned using Entire. If the reference data is partitioned Entire then it is all available to all nodes on the source input so there is no need to change the partitioning on that input.

The usual guidelines apply - aim to repartition as little as possible and ensure that data is evenly distributed across the nodes.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

And it is not must to have entire partition on the reference lookup. Hash partiton on the key for both input and refernce like will do.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
dsusr
Premium Member
Premium Member
Posts: 104
Joined: Sat Sep 03, 2005 11:30 pm

Post by dsusr »

Partitioning will help in certain aspects but unnnecsessary partitioning degrades the performance of jobs. In addition to what Ray,thomsonp and others have said that you need to do proper partitioning in Remove duplicate and sort stages as well.

In the case of lookup leave the partitioning as auto on both links as DataStage will automatically insert the entire partitinoning in reference node.

Also try to do the partitioning as early as possible and if the joining keys are not changing then just don't do the partinoning again and use the same partition method.

---dsusr
Post Reply