In line sort before join

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

In line sort before join

Post by soumya5891 »

I am joining two table in the datastage join stage. I did hash partition on the join key in both the link. And by mistake I forgot made in line sort in one of the link. In this case for the matched records also the join did not happen properly. But when I made the in line sort then it worked fine. So is it mandetory for join that if we sort in one link then we must sort in the other link also.
Soumya
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

For join to work, the sort order of records should match in both input links.

Regards,
Senthil
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

For join to work, the sort order of records should match in both input links.

Regards,
Senthil
bhasds
Participant
Posts: 79
Joined: Thu May 27, 2010 1:49 am

Post by bhasds »

Join stage by default will insert tsort operator in both the dataset prior to join if we do not sort the datasets explicitly.This can be seen in the Job score.Please refer to the below post for more information.
viewtopic.php?p=191759
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Post by soumya5891 »

When I 'm partitioning in the input then same key combination records will be in same node,it doesn't depend on the sort. So is it something like that it will join not by the value directly but with some other means.That's why sorted order is mandatory in both the links.

I will also try one thing. I will make the sorted order different in the two links with same partitions.
Soumya
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

soumya5891,

when you experiment by changing the sort order, make sure that look at the score, since DataStage will automatically insert a tsort operator at runtime should the links be sorted differently. You can disable this behaviour using the $APT_NO_SORT_INSERTION and setting to to "true" - then you will get incorrect results.
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Post by soumya5891 »

Thanks a lot ArndW for your suggestion. Surely I will keep in mind your suggestion when I will do the checking.
Soumya
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

For your reference, the documentation for the Join stage does specifically state that all inputs must be key partitioned and sorted in ascending order.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply