Hi,
I have a question regarding partition in join stage. My job design is to join two datasets. My question is, should I hash partition during this join in join stage. Because when we join two database stages the join stage will wait till all the records are read form the table and so we will get correct results. Is this the case with joining two datasets too?
Thanks
Join stage partition
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
This isn't true, unless you've got a sort somewhere in your job. The join will operate in a 'pipeline' fashion, regardless of whether its source data are provided by a database of dataset stage.... when we join two database stages the join stage will wait till all the records are read form the table ...
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
-
- Premium Member
- Posts: 25
- Joined: Thu Jan 31, 2008 11:06 pm
- Location: Australia
Joining data streams that are not pre-sorted on the join key will cause a tsort operator to be inserted in the input links if the auto partitioning method is used. In fact, the sort stage is sometimes used in a "Don't sort" mode simply to avoid re-sorting.This isn't true, unless you've got a sort somewhere in your job. The join will operate in a 'pipeline' fashion, regardless of whether its source data are provided by a database of dataset stage.
As to whether you need to partition, if you leave the partitioning method as auto, it should take care of itself.
Jim Paradies