Page 1 of 1

sorted input to Join

Posted: Wed Feb 08, 2006 9:55 am
by djoni
Must Join stage have sorted inputs (all or any one) :?:

Posted: Wed Feb 08, 2006 10:01 am
by roy
Hi,
To quote the help:
The data sets input to the Join stage must be key partitioned and sorted.

Re: sorted input to Join

Posted: Wed Feb 08, 2006 10:01 am
by felixyong
djoni wrote:Must Join stage have sorted inputs (all or any one) :?:
It is recommended to sort before join so that the join will be more efficient. If you're sorting than all sources must be sorts the same way before join.

It is even better to sort using the RDBMs if the join is already indexed in the RDBMs so that you can save processing time & resources in DataStage Server.

Re: sorted input to Join

Posted: Wed Feb 08, 2006 2:58 pm
by djoni
felixyong wrote:
djoni wrote:Must Join stage have sorted inputs (all or any one) :?:
It is recommended to sort before join so that the join will be more efficient. If you're sorting than all sources must be sorts the same way before join.

It is even better to sort using the RDBMs if the join is already indexed in the RDBMs so that you can save processing time & resources in DataStage Server.
Recommended or Mandatory?

Posted: Wed Feb 08, 2006 10:46 pm
by ray.wurlod
Mandatory if the manual is to be believed. I believe it so have always key partitioned and sorted Join stage inputs. Perhaps you'd like to try without, and let us know the result?

Posted: Thu Feb 09, 2006 10:29 am
by djoni
ray.wurlod wrote:Mandatory if the manual is to be believed. I believe it so have always key partitioned and sorted Join stage inputs. Perhaps you'd like to try without, and let us know the result?
Runs well on two un-sorted sequential files, auto partitioned no sort.

So, is something wrong with the manual and .... EE Essential course?

Posted: Thu Feb 09, 2006 4:02 pm
by Gaurav.Dave
Well, with sequential files it behaves differently....

But when you use Datasets, it's partition based, u need to key partioned and sorted it before you input to ur join stage...

Gaurav Dave

Posted: Thu Feb 09, 2006 9:42 pm
by dsusr
djoni wrote:
ray.wurlod wrote:Mandatory if the manual is to be believed. I believe it so have always key partitioned and sorted Join stage inputs. Perhaps you'd like to try without, and let us know the result?
Runs well on two un-sorted sequential files, auto partitioned no sort.

So, is something wrong with the manual and .... EE Essential course?

See the problem occurs when you are running the job on multiple nodes and with large amount of data because if at that time you didnt do the hash partitioning on key then two records with same key value can go in different partitions and join will not take place. We have faced this type of issue in one of our projects.

As far as sort is concerned it is basically to improve the performance. So partitioning is mandatory but sort is preferable.

Regards
dsusr