Hi,
I don't have parallel extender installed, so can't try this.
I came to hear that it is compulsory to sort data before using lookup/merge/join/cdc stages.
Is it true? If so why? I can't think of a reason. Is it for performance?
Thanks
sort before lookup/cdc
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 97
- Joined: Tue Feb 21, 2006 6:45 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Did you Search? This has been explained in the past. In short, though, it is to minimize demand on memory. The Lookup stage does not require, and can not benefit from, sorted inputs. The Join stage requires sorted input. The Merge stage requires sorted inputs but can tolerate unsorted master input. The Remove Duplicates stage requires sorted input. The Aggregator stage requires sorted input if Sort mode is specified. The Transformer stage may require sorted input if you are using stage variables to identify changed values in a field.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 62
- Joined: Sat Mar 07, 2009 4:59 am
- Location: Chicago
- Contact:
Required in some cases
Hi
Sorting data before sending to lookup/merge/join/cdc stages is necessary
why means for lookup no need to sort the data
for join not mandatory but improves performance
for merge its compulsory to sort before sending to it,it itself has a default sort link,merge sends reference tables data to reject links and the default sort option in the stage perform sort at sequential level not parallel
and for CDC also its very important
for performance wise
Sorting data before sending to lookup/merge/join/cdc stages is necessary
why means for lookup no need to sort the data
for join not mandatory but improves performance
for merge its compulsory to sort before sending to it,it itself has a default sort link,merge sends reference tables data to reject links and the default sort option in the stage perform sort at sequential level not parallel
and for CDC also its very important
for performance wise
Suresh Reddy
ETL Developer
Research Operations
"its important to know in which direction we are moving rather than where we are"
ETL Developer
Research Operations
"its important to know in which direction we are moving rather than where we are"
Re: Required in some cases
In fact, it's quite the opposite, it is mandatory for a joinsureshreddy2009 wrote:
for join not mandatory but improves performance
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Re: Required in some cases
Its not mandatory for join ..
miwinter wrote:In fact, it's quite the opposite, it is mandatory for a joinsureshreddy2009 wrote:
for join not mandatory but improves performance
Nag