For Each job, is there any need of partitioning
suppose i have job like this
source ------(partitioning)------> trasnfromation ---(partitioning)---->target
we need to do as i mentioned above or better to keep Auto partitioning
which is the best pratice
Thanks
Partitioning
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi,
Do we always have to key partition data before a lookup/join as Ray suggested or there any special case when we do this as my join/look up working fine without this, or i'm missing something ?![Question :?:](./images/smilies/icon_question.gif)
Do we always have to key partition data before a lookup/join as Ray suggested or there any special case when we do this as my join/look up working fine without this, or i'm missing something ?
![Question :?:](./images/smilies/icon_question.gif)
Om Prakash
"There are things that are known, and there are things that are unknown, and in between there are doors"
"There are things that are known, and there are things that are unknown, and in between there are doors"
You don't have to partition data before every join or lookup.
In the case of the join you have to ensure that the data with the same keys will be in the same partitions on each side of the join. This usually means partitioning by one or more of the join keys and sorting by all of them. However if the data is already partitioned in this way by a previous stage you can leave it be.
In the case of a lookup you only need partition on the keys if the reference data is not partitioned using Entire. If the reference data is partitioned Entire then it is all available to all nodes on the source input so there is no need to change the partitioning on that input.
The usual guidelines apply - aim to repartition as little as possible and ensure that data is evenly distributed across the nodes.
In the case of the join you have to ensure that the data with the same keys will be in the same partitions on each side of the join. This usually means partitioning by one or more of the join keys and sorting by all of them. However if the data is already partitioned in this way by a previous stage you can leave it be.
In the case of a lookup you only need partition on the keys if the reference data is not partitioned using Entire. If the reference data is partitioned Entire then it is all available to all nodes on the source input so there is no need to change the partitioning on that input.
The usual guidelines apply - aim to repartition as little as possible and ensure that data is evenly distributed across the nodes.
Partitioning will help in certain aspects but unnnecsessary partitioning degrades the performance of jobs. In addition to what Ray,thomsonp and others have said that you need to do proper partitioning in Remove duplicate and sort stages as well.
In the case of lookup leave the partitioning as auto on both links as DataStage will automatically insert the entire partitinoning in reference node.
Also try to do the partitioning as early as possible and if the joining keys are not changing then just don't do the partinoning again and use the same partition method.
---dsusr
In the case of lookup leave the partitioning as auto on both links as DataStage will automatically insert the entire partitinoning in reference node.
Also try to do the partitioning as early as possible and if the joining keys are not changing then just don't do the partinoning again and use the same partition method.
---dsusr