sort before lookup/cdc

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

sort before lookup/cdc

Post by tostay2003 »

Hi,

I don't have parallel extender installed, so can't try this.

I came to hear that it is compulsory to sort data before using lookup/merge/join/cdc stages.

Is it true? If so why? I can't think of a reason. Is it for performance?

Thanks
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Even though if you dont sort explicitly datastage will insert tsort operators and sort the data before passing it to join stage..you can crosscheck the same in dump score
Nag
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

But why is it important to sort before these stages?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did you Search? This has been explained in the past. In short, though, it is to minimize demand on memory. The Lookup stage does not require, and can not benefit from, sorted inputs. The Join stage requires sorted input. The Merge stage requires sorted inputs but can tolerate unsorted master input. The Remove Duplicates stage requires sorted input. The Aggregator stage requires sorted input if Sort mode is specified. The Transformer stage may require sorted input if you are using stage variables to identify changed values in a field.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sureshreddy2009
Participant
Posts: 62
Joined: Sat Mar 07, 2009 4:59 am
Location: Chicago
Contact:

Required in some cases

Post by sureshreddy2009 »

:oops: Hi
Sorting data before sending to lookup/merge/join/cdc stages is necessary
why means for lookup no need to sort the data
for join not mandatory but improves performance
for merge its compulsory to sort before sending to it,it itself has a default sort link,merge sends reference tables data to reject links and the default sort option in the stage perform sort at sequential level not parallel
and for CDC also its very important
for performance wise
Suresh Reddy
ETL Developer
Research Operations

"its important to know in which direction we are moving rather than where we are"
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Re: Required in some cases

Post by miwinter »

sureshreddy2009 wrote::oops:
for join not mandatory but improves performance
In fact, it's quite the opposite, it is mandatory for a join
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Re: Required in some cases

Post by nagarjuna »

Its not mandatory for join ..

miwinter wrote:
sureshreddy2009 wrote::oops:
for join not mandatory but improves performance
In fact, it's quite the opposite, it is mandatory for a join
Nag
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

'Mandatory' for effective use, it is. If you mean in order to join the required records accurately, then no, depends on your interpretation.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Post Reply