Page 1 of 2

How to preventing the join stage from sorting the records

Posted: Wed Aug 02, 2006 8:34 am
by ThilSe
Hi,

I have a job that does the sorting in Database itself.

Is there any way by which one can prevent the join stage from sorting the input data again.

Thanks
Senthil

Re: How to preventing the join stage from sorting the record

Posted: Wed Aug 02, 2006 11:08 am
by deployDS
[quote="ThilSe"]Hi,

I have a job that does the sorting in Database itself.

Is there any way by which one can prevent the join stage from sorting the input data again.

Thanks
Senthil[/quote]

Join stage does not sort incoming data unless you specify it to.

Re: How to preventing the join stage from sorting the record

Posted: Wed Aug 02, 2006 4:35 pm
by clickart
You can check the partitioning type specified on the link.

Posted: Wed Aug 02, 2006 6:12 pm
by meena
Hi Senthil,
I agree with 'deployDS '

Code: Select all

Join stage does not sort incoming data unless you specify it to. 
There is option in the stage to sort. You can enable it if you want the sorting option else not...

Thank you

Posted: Wed Aug 02, 2006 10:25 pm
by kumar_s
It might the obvious myth that gets in to mind that, Partion makes the data sorted. No.

Re: How to preventing the join stage from sorting the record

Posted: Wed Aug 02, 2006 11:02 pm
by balajisr
deployDS wrote: Join stage does not sort incoming data unless you specify it to.
Can you clarify this.

I think join stage does sorting implicitly.

Posted: Wed Aug 02, 2006 11:48 pm
by SriKara
I think join stage does sorting implicitly
Join stage does require both of its input datasets to be in sorted order for a correct output dataset.

However the default partitioning method which is "AUTO" ensures that sorting is done on both the input links to the stage. It is worthwile to note that "AUTO" partitioning method causes the data to be sorted. if somebody has specified another method of partition, then we need to take care that sorted data is input to join stage - either by specifying perform sort on input link partitioning method or by deploying a sort stage.

Posted: Thu Aug 03, 2006 12:54 am
by Kirtikumar
SriKara wrote:Join stage does require both of its input datasets to be in sorted order for a correct output dataset.
Is it the case - that if the data in the join is not sorted and just hash partitioned, then it will not give correct result?

As per whatever exp I have on PX with join stage, it seems this is not true.
Even if the data on which you are doing join is just hash partitioned with no sorting on keys, still you should get the correctly joined data.

So why sorting is required - may be to make search or join process faster. Its normal rule, if data is sorted then search will be faster.

So I think as long as data is hash partioned on join keys, sorting would make no diff to the output data.

Posted: Thu Aug 03, 2006 1:13 am
by SriKara
Agree with Kirtikumar completely.
I should have said "Join stage requires both of its input datasets to be key partitioned and sorted".

Hash key partition ensures, the same keys are in one partition and are not scattered across. This would ensure a correct join. Sorted data would make things faster.

The point i was trying to make is, join stage may not sort implicitly as mentioned by balajisr.

Posted: Thu Aug 03, 2006 1:46 am
by kumar_s

Code: Select all

The data sets input to the Join stage must be key partitioned and sorted.
is the recomendation given by Ascential, for the best results. Threre is auto insertion of tsort and psort if the data is found not in sorted order. And this can be controlled by the parameter available in DS Admnistrator. Turn this off and test with the unsorted data, you may get the undesired result.
As thumb rule, the data should be properly partitioned and sorted based on the joining key.

Posted: Thu Aug 03, 2006 2:19 am
by balajisr
SriKara wrote:Agree with Kirtikumar completely.
I should have said "Join stage requires both of its input datasets to be key partitioned and sorted".

Hash key partition ensures, the same keys are in one partition and are not scattered across. This would ensure a correct join. Sorted data would make things faster.

The point i was trying to make is, join stage may not sort implicitly as mentioned by balajisr.
As far as i know Join stage sorts data implicitly even when you specify hash partitioning with no sorting options. Please correct me if i am wrong.
As kumar mentioned there is a parameter available which when set does not sort implicitly.

Posted: Thu Aug 03, 2006 10:05 am
by splayer
kumar_s, unless you check the Perform Sort checkbox, it should not sort, right? Well, if sorting is implicit regardless of whether you have checked Perform Sort or not, then setting this check box is useless.

Can we set "same" partitioning method to prevent sorting? Of course that would mean that it'll take the previous stage's partitioning method.

Please shed some light.

Posted: Thu Aug 03, 2006 7:43 pm
by kumar_s
Ok after going through the Operators Guide, though it is given in osh terms, upto my understanding, the auto insertion of partiton and sort happens only if we have auto partition (Datastage). And tsort and psort are the two integrated options of the available Sort stages(Operators). The auto insertion of the sort operators and partition can be controlled by the two environment variables APT_NO_PART_INSERTION and APT_NO_SORT_INSERTION.
To avoid the implicit interference, we can use our explict sort and partition.
Auto partition is not as such avaiable in the osh, but it was implicitly used. Fortunately we were given the explict option of seleting Auto partiton in Datastage.

IHTH

Posted: Thu Aug 03, 2006 8:31 pm
by dsdesigner
Another way you could check the effects of enabling the various options (explicit sort stage, sort on partition, auto partition, hash partition, etc.) would be create jobs with the various alternatives, compile the job and then check the "Generated OSH". Of course, the ability to view the "generated osh" must have been enabled by your administrtaor".

Thanks,
Shekar

Re: How to preventing the join stage from sorting the record

Posted: Thu Aug 03, 2006 10:12 pm
by felixyong
ThilSe wrote:Hi,

I have a job that does the sorting in Database itself.

Is there any way by which one can prevent the join stage from sorting the input data again.

Thanks
Senthil
Hi

By default sort is added as you can see it in the score, as this is how DS EE ensure correct results in join.

The method to prevent auto insertion is the following -
Set $APT_SORT_INSERTION_CHECK_ONLY to change behavior of automatically inserted sorts

Regards
Felix