sorted input to Join

djoni · Post by **djoni** » Wed Feb 08, 2006 9:55 am

Must Join stage have sorted inputs (all or any one)

roy · Post by **roy** » Wed Feb 08, 2006 10:01 am

Hi,
To quote the help:

The data sets input to the Join stage must be key partitioned and sorted.

felixyong · Post by **felixyong** » Wed Feb 08, 2006 10:01 am

djoni wrote:Must Join stage have sorted inputs (all or any one)

It is recommended to sort before join so that the join will be more efficient. If you're sorting than all sources must be sorts the same way before join.

It is even better to sort using the RDBMs if the join is already indexed in the RDBMs so that you can save processing time & resources in DataStage Server.

djoni · Post by **djoni** » Wed Feb 08, 2006 2:58 pm

felixyong wrote:
djoni wrote:Must Join stage have sorted inputs (all or any one)
It is recommended to sort before join so that the join will be more efficient. If you're sorting than all sources must be sorts the same way before join.

It is even better to sort using the RDBMs if the join is already indexed in the RDBMs so that you can save processing time & resources in DataStage Server.

Recommended or Mandatory?

ray.wurlod · Post by **ray.wurlod** » Wed Feb 08, 2006 10:46 pm

Mandatory if the manual is to be believed. I believe it so have always key partitioned and sorted Join stage inputs. Perhaps you'd like to try without, and let us know the result?

djoni · Post by **djoni** » Thu Feb 09, 2006 10:29 am

ray.wurlod wrote:Mandatory if the manual is to be believed. I believe it so have always key partitioned and sorted Join stage inputs. Perhaps you'd like to try without, and let us know the result?

Runs well on two un-sorted sequential files, auto partitioned no sort.

So, is something wrong with the manual and .... EE Essential course?

Gaurav.Dave · Post by **Gaurav.Dave** » Thu Feb 09, 2006 4:02 pm

Well, with sequential files it behaves differently....

But when you use Datasets, it's partition based, u need to key partioned and sorted it before you input to ur join stage...

Gaurav Dave

dsusr · Post by **dsusr** » Thu Feb 09, 2006 9:42 pm

djoni wrote:
ray.wurlod wrote:Mandatory if the manual is to be believed. I believe it so have always key partitioned and sorted Join stage inputs. Perhaps you'd like to try without, and let us know the result?
Runs well on two un-sorted sequential files, auto partitioned no sort.

So, is something wrong with the manual and .... EE Essential course?

See the problem occurs when you are running the job on multiple nodes and with large amount of data because if at that time you didnt do the hash partitioning on key then two records with same key value can go in different partitions and join will not take place. We have faced this type of issue in one of our projects.

As far as sort is concerned it is basically to improve the performance. So partitioning is mandatory but sort is preferable.

Regards
dsusr

DSXchange

sorted input to Join

sorted input to Join

Re: sorted input to Join

Re: sorted input to Join