Page 1 of 1

sort before lookup/cdc

Posted: Mon Jun 01, 2009 5:25 pm
by tostay2003
Hi,

I don't have parallel extender installed, so can't try this.

I came to hear that it is compulsory to sort data before using lookup/merge/join/cdc stages.

Is it true? If so why? I can't think of a reason. Is it for performance?

Thanks

Posted: Mon Jun 01, 2009 5:59 pm
by nagarjuna
Even though if you dont sort explicitly datastage will insert tsort operators and sort the data before passing it to join stage..you can crosscheck the same in dump score

Posted: Mon Jun 01, 2009 6:00 pm
by tostay2003
But why is it important to sort before these stages?

Posted: Mon Jun 01, 2009 7:06 pm
by ray.wurlod
Did you Search? This has been explained in the past. In short, though, it is to minimize demand on memory. The Lookup stage does not require, and can not benefit from, sorted inputs. The Join stage requires sorted input. The Merge stage requires sorted inputs but can tolerate unsorted master input. The Remove Duplicates stage requires sorted input. The Aggregator stage requires sorted input if Sort mode is specified. The Transformer stage may require sorted input if you are using stage variables to identify changed values in a field.

Required in some cases

Posted: Tue Jun 02, 2009 5:58 am
by sureshreddy2009
:oops: Hi
Sorting data before sending to lookup/merge/join/cdc stages is necessary
why means for lookup no need to sort the data
for join not mandatory but improves performance
for merge its compulsory to sort before sending to it,it itself has a default sort link,merge sends reference tables data to reject links and the default sort option in the stage perform sort at sequential level not parallel
and for CDC also its very important
for performance wise

Re: Required in some cases

Posted: Tue Jun 02, 2009 6:06 am
by miwinter
sureshreddy2009 wrote::oops:
for join not mandatory but improves performance
In fact, it's quite the opposite, it is mandatory for a join

Re: Required in some cases

Posted: Tue Jun 02, 2009 6:13 am
by nagarjuna
Its not mandatory for join ..

miwinter wrote:
sureshreddy2009 wrote::oops:
for join not mandatory but improves performance
In fact, it's quite the opposite, it is mandatory for a join

Posted: Tue Jun 02, 2009 7:16 am
by miwinter
'Mandatory' for effective use, it is. If you mean in order to join the required records accurately, then no, depends on your interpretation.