How to preventing the join stage from sorting the records

ThilSe · Post by **ThilSe** » Wed Aug 02, 2006 8:34 am

Hi,

I have a job that does the sorting in Database itself.

Is there any way by which one can prevent the join stage from sorting the input data again.

Thanks
Senthil

deployDS · Post by **deployDS** » Wed Aug 02, 2006 11:08 am

[quote="ThilSe"]Hi,

I have a job that does the sorting in Database itself.

Is there any way by which one can prevent the join stage from sorting the input data again.

Thanks
Senthil[/quote]

Join stage does not sort incoming data unless you specify it to.

clickart · Post by **clickart** » Wed Aug 02, 2006 4:35 pm

You can check the partitioning type specified on the link.

meena · Post by **meena** » Wed Aug 02, 2006 6:12 pm

Hi Senthil,
I agree with 'deployDS '

Code: Select all

Join stage does not sort incoming data unless you specify it to.

There is option in the stage to sort. You can enable it if you want the sorting option else not...

Thank you

kumar_s · Post by **kumar_s** » Wed Aug 02, 2006 10:25 pm

It might the obvious myth that gets in to mind that, Partion makes the data sorted. No.

balajisr · Post by **balajisr** » Wed Aug 02, 2006 11:02 pm

deployDS wrote: Join stage does not sort incoming data unless you specify it to.

Can you clarify this.

I think join stage does sorting implicitly.

SriKara · Post by **SriKara** » Wed Aug 02, 2006 11:48 pm

I think join stage does sorting implicitly

Join stage does require both of its input datasets to be in sorted order for a correct output dataset.

However the default partitioning method which is "AUTO" ensures that sorting is done on both the input links to the stage. It is worthwile to note that "AUTO" partitioning method causes the data to be sorted. if somebody has specified another method of partition, then we need to take care that sorted data is input to join stage - either by specifying perform sort on input link partitioning method or by deploying a sort stage.

Kirtikumar · Post by **Kirtikumar** » Thu Aug 03, 2006 12:54 am

SriKara wrote:Join stage does require both of its input datasets to be in sorted order for a correct output dataset.

Is it the case - that if the data in the join is not sorted and just hash partitioned, then it will not give correct result?

As per whatever exp I have on PX with join stage, it seems this is not true.
Even if the data on which you are doing join is just hash partitioned with no sorting on keys, still you should get the correctly joined data.

So why sorting is required - may be to make search or join process faster. Its normal rule, if data is sorted then search will be faster.

So I think as long as data is hash partioned on join keys, sorting would make no diff to the output data.

SriKara · Post by **SriKara** » Thu Aug 03, 2006 1:13 am

Agree with Kirtikumar completely.
I should have said "Join stage requires both of its input datasets to be key partitioned and sorted".

Hash key partition ensures, the same keys are in one partition and are not scattered across. This would ensure a correct join. Sorted data would make things faster.

The point i was trying to make is, join stage may not sort implicitly as mentioned by balajisr.

kumar_s · Post by **kumar_s** » Thu Aug 03, 2006 1:46 am

Code: Select all

The data sets input to the Join stage must be key partitioned and sorted.

is the recomendation given by Ascential, for the best results. Threre is auto insertion of tsort and psort if the data is found not in sorted order. And this can be controlled by the parameter available in DS Admnistrator. Turn this off and test with the unsorted data, you may get the undesired result.
As thumb rule, the data should be properly partitioned and sorted based on the joining key.

balajisr · Post by **balajisr** » Thu Aug 03, 2006 2:19 am

SriKara wrote:Agree with Kirtikumar completely.
I should have said "Join stage requires both of its input datasets to be key partitioned and sorted".

Hash key partition ensures, the same keys are in one partition and are not scattered across. This would ensure a correct join. Sorted data would make things faster.

The point i was trying to make is, join stage may not sort implicitly as mentioned by balajisr.

As far as i know Join stage sorts data implicitly even when you specify hash partitioning with no sorting options. Please correct me if i am wrong.
As kumar mentioned there is a parameter available which when set does not sort implicitly.

splayer · Post by **splayer** » Thu Aug 03, 2006 10:05 am

kumar_s, unless you check the Perform Sort checkbox, it should not sort, right? Well, if sorting is implicit regardless of whether you have checked Perform Sort or not, then setting this check box is useless.

Can we set "same" partitioning method to prevent sorting? Of course that would mean that it'll take the previous stage's partitioning method.

Please shed some light.

kumar_s · Post by **kumar_s** » Thu Aug 03, 2006 7:43 pm

Ok after going through the Operators Guide, though it is given in osh terms, upto my understanding, the auto insertion of partiton and sort happens only if we have auto partition (Datastage). And tsort and psort are the two integrated options of the available Sort stages(Operators). The auto insertion of the sort operators and partition can be controlled by the two environment variables APT_NO_PART_INSERTION and APT_NO_SORT_INSERTION.
To avoid the implicit interference, we can use our explict sort and partition.
Auto partition is not as such avaiable in the osh, but it was implicitly used. Fortunately we were given the explict option of seleting Auto partiton in Datastage.

IHTH

dsdesigner · Post by **dsdesigner** » Thu Aug 03, 2006 8:31 pm

Another way you could check the effects of enabling the various options (explicit sort stage, sort on partition, auto partition, hash partition, etc.) would be create jobs with the various alternatives, compile the job and then check the "Generated OSH". Of course, the ability to view the "generated osh" must have been enabled by your administrtaor".

Thanks,
Shekar

felixyong · Post by **felixyong** » Thu Aug 03, 2006 10:12 pm

ThilSe wrote:Hi,

I have a job that does the sorting in Database itself.

Is there any way by which one can prevent the join stage from sorting the input data again.

Thanks
Senthil

Hi

By default sort is added as you can see it in the score, as this is how DS EE ensure correct results in join.

The method to prevent auto insertion is the following -
Set $APT_SORT_INSERTION_CHECK_ONLY to change behavior of automatically inserted sorts

Regards
Felix

DSXchange

How to preventing the join stage from sorting the records

How to preventing the join stage from sorting the records

Re: How to preventing the join stage from sorting the record

Re: How to preventing the join stage from sorting the record

Re: How to preventing the join stage from sorting the record

Re: How to preventing the join stage from sorting the record