How to preventing the join stage from sorting the records
Moderators: chulett, rschirm, roy
How to preventing the join stage from sorting the records
Hi,
I have a job that does the sorting in Database itself.
Is there any way by which one can prevent the join stage from sorting the input data again.
Thanks
Senthil
I have a job that does the sorting in Database itself.
Is there any way by which one can prevent the join stage from sorting the input data again.
Thanks
Senthil
Re: How to preventing the join stage from sorting the record
[quote="ThilSe"]Hi,
I have a job that does the sorting in Database itself.
Is there any way by which one can prevent the join stage from sorting the input data again.
Thanks
Senthil[/quote]
Join stage does not sort incoming data unless you specify it to.
I have a job that does the sorting in Database itself.
Is there any way by which one can prevent the join stage from sorting the input data again.
Thanks
Senthil[/quote]
Join stage does not sort incoming data unless you specify it to.
Re: How to preventing the join stage from sorting the record
You can check the partitioning type specified on the link.
Hi Senthil,
I agree with 'deployDS '
There is option in the stage to sort. You can enable it if you want the sorting option else not...
Thank you
I agree with 'deployDS '
Code: Select all
Join stage does not sort incoming data unless you specify it to.
Thank you
Re: How to preventing the join stage from sorting the record
Can you clarify this.deployDS wrote: Join stage does not sort incoming data unless you specify it to.
I think join stage does sorting implicitly.
Join stage does require both of its input datasets to be in sorted order for a correct output dataset.I think join stage does sorting implicitly
However the default partitioning method which is "AUTO" ensures that sorting is done on both the input links to the stage. It is worthwile to note that "AUTO" partitioning method causes the data to be sorted. if somebody has specified another method of partition, then we need to take care that sorted data is input to join stage - either by specifying perform sort on input link partitioning method or by deploying a sort stage.
Regards,
SriKara
SriKara
-
- Participant
- Posts: 437
- Joined: Fri Oct 15, 2004 6:13 am
- Location: Pune, India
Is it the case - that if the data in the join is not sorted and just hash partitioned, then it will not give correct result?SriKara wrote:Join stage does require both of its input datasets to be in sorted order for a correct output dataset.
As per whatever exp I have on PX with join stage, it seems this is not true.
Even if the data on which you are doing join is just hash partitioned with no sorting on keys, still you should get the correctly joined data.
So why sorting is required - may be to make search or join process faster. Its normal rule, if data is sorted then search will be faster.
So I think as long as data is hash partioned on join keys, sorting would make no diff to the output data.
Regards,
S. Kirtikumar.
S. Kirtikumar.
Agree with Kirtikumar completely.
I should have said "Join stage requires both of its input datasets to be key partitioned and sorted".
Hash key partition ensures, the same keys are in one partition and are not scattered across. This would ensure a correct join. Sorted data would make things faster.
The point i was trying to make is, join stage may not sort implicitly as mentioned by balajisr.
I should have said "Join stage requires both of its input datasets to be key partitioned and sorted".
Hash key partition ensures, the same keys are in one partition and are not scattered across. This would ensure a correct join. Sorted data would make things faster.
The point i was trying to make is, join stage may not sort implicitly as mentioned by balajisr.
Regards,
SriKara
SriKara
Code: Select all
The data sets input to the Join stage must be key partitioned and sorted.
As thumb rule, the data should be properly partitioned and sorted based on the joining key.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
As far as i know Join stage sorts data implicitly even when you specify hash partitioning with no sorting options. Please correct me if i am wrong.SriKara wrote:Agree with Kirtikumar completely.
I should have said "Join stage requires both of its input datasets to be key partitioned and sorted".
Hash key partition ensures, the same keys are in one partition and are not scattered across. This would ensure a correct join. Sorted data would make things faster.
The point i was trying to make is, join stage may not sort implicitly as mentioned by balajisr.
As kumar mentioned there is a parameter available which when set does not sort implicitly.
kumar_s, unless you check the Perform Sort checkbox, it should not sort, right? Well, if sorting is implicit regardless of whether you have checked Perform Sort or not, then setting this check box is useless.
Can we set "same" partitioning method to prevent sorting? Of course that would mean that it'll take the previous stage's partitioning method.
Please shed some light.
Can we set "same" partitioning method to prevent sorting? Of course that would mean that it'll take the previous stage's partitioning method.
Please shed some light.
Ok after going through the Operators Guide, though it is given in osh terms, upto my understanding, the auto insertion of partiton and sort happens only if we have auto partition (Datastage). And tsort and psort are the two integrated options of the available Sort stages(Operators). The auto insertion of the sort operators and partition can be controlled by the two environment variables APT_NO_PART_INSERTION and APT_NO_SORT_INSERTION.
To avoid the implicit interference, we can use our explict sort and partition.
Auto partition is not as such avaiable in the osh, but it was implicitly used. Fortunately we were given the explict option of seleting Auto partiton in Datastage.
IHTH
To avoid the implicit interference, we can use our explict sort and partition.
Auto partition is not as such avaiable in the osh, but it was implicitly used. Fortunately we were given the explict option of seleting Auto partiton in Datastage.
IHTH
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 34
- Joined: Thu Jul 29, 2004 1:03 pm
Another way you could check the effects of enabling the various options (explicit sort stage, sort on partition, auto partition, hash partition, etc.) would be create jobs with the various alternatives, compile the job and then check the "Generated OSH". Of course, the ability to view the "generated osh" must have been enabled by your administrtaor".
Thanks,
Shekar
Thanks,
Shekar
Re: How to preventing the join stage from sorting the record
HiThilSe wrote:Hi,
I have a job that does the sorting in Database itself.
Is there any way by which one can prevent the join stage from sorting the input data again.
Thanks
Senthil
By default sort is added as you can see it in the score, as this is how DS EE ensure correct results in join.
The method to prevent auto insertion is the following -
Set $APT_SORT_INSERTION_CHECK_ONLY to change behavior of automatically inserted sorts
Regards
Felix