Number of reference links limit
Moderators: chulett, rschirm, roy
Number of reference links limit
Hi All,
Is there a limit on the number of reference links for a lookup stage?
I may have to reference around 70 lookup tables in one lookup stage, so was wondering if there is a limit and if there will be any performance issues.
Is there a limit on the number of reference links for a lookup stage?
I may have to reference around 70 lookup tables in one lookup stage, so was wondering if there is a limit and if there will be any performance issues.
-
- Participant
- Posts: 26
- Joined: Mon Aug 27, 2007 6:27 am
- Location: Des Moines
Re: Number of reference links limit
when there is data growth, adversely affect the execution speed. Lookup can have N+1 input links.
Regards,
Bharath Tipirisetty
Bharath Tipirisetty
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Number of reference links limit
No you won't.brupun wrote:I may have to reference around 70 lookup tables in one lookup stage.
There's no reason not to use multiple Lookup stages. You can make the downstream lookups conditional on the success of the upstream ones if that is your wont.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: Number of reference links limit
Using one lookup stage did not give desired results, but then again to lookup around 60-70 codes, will I need to use that many lookup stages? How badly will it affect the job's performance?
There's no difference between doing X lookups in 1 stage versus doing 1 lookup in X stages, other than the 'clutter' on the canvas. If you didn't get the 'desired result' all clumped into one, I doubt anything will change if you separate them out.
If you are looking for help with getting your desired result, you have to first explain what that might be - what the result was that you got compared to what you wanted.
If you are looking for help with getting your desired result, you have to first explain what that might be - what the result was that you got compared to what you wanted.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
We can't predict the performance effect for you. However, what you may gain through using multiple lookup stages is easier maintenance of the lookups (fewer lookups per stage is easier for me to manage, your experience may be different) as well as the potential for better pipeline parallelism as one stage passes data to the next and then processes another piece of data.
As mentioned, look at the relationship between lookups...are some dependent on the results of others? If so, ensure that they are executed in the correct order. Perhaps some of the reference tables can be combined into a single lookup (same keys for source and reference).
Regards,
As mentioned, look at the relationship between lookups...are some dependent on the results of others? If so, ensure that they are executed in the correct order. Perhaps some of the reference tables can be combined into a single lookup (same keys for source and reference).
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
What I'm trying to achieve is, validate around 60-70 code columns in the source dataset by looking up in reference code tables. Code columns in source are nullable. I need to capture records where there was no match for codes but at the same time continue passing those records forward to the next stage.
Now if I do X lookups in 1 stage, I'm not able to capture unmatched recs for each code column. Doing 1 lookup in X stages helps my requirement. I was worried about the effect on performance in using multiple lookup stages on 1 canvas.
Now if I do X lookups in 1 stage, I'm not able to capture unmatched recs for each code column. Doing 1 lookup in X stages helps my requirement. I was worried about the effect on performance in using multiple lookup stages on 1 canvas.
You can easily do so: have each code lookup simply add an indicator column (tinyint) with a value of 1 (hardcoded in the reference table) when it matches, null or equal to 0 when there's not a match. Examine the indicator columns in a transformer after the lookups are complete and handle the match/nomatch results accordingly.Now if I do X lookups in 1 stage, I'm not able to capture unmatched recs for each code column.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Fewer lookups per stage are definitely easier for me to manage too. In my case, these lookups are not dependent on each other, I'm going to create copy of source dataset and perform all lookups in parallel.jwiles wrote:We can't predict the performance effect for you. However, what you may gain through using multiple lookup stages is easier maintenance of the lookups (fewer lookups per stage is easier for me to manage, your experience may be different) as well as the potential for better pipeline parallelism as one stage passes data to the next and then processes another piece of data.
As mentioned, look at the relationship between lookups...are some dependent on the results of others? If so, ensure that they are executed in the correct order. Perhaps some of the reference tables can be combined into a single lookup (same keys for source and reference).
Allow DataStage to handle the parallelism for you by running the job in parallel (multiple logical nodes in your configuration file), but I would recommend combining lookup stages at a reasonable level...6 or 7 lookups per stage at most is comfortable for me. If you use the indicator columns method I mentioned previously, then the lookups will be easy to setup and transformer logic is pretty straightforward (depending on what you need to do for non-matching codes).
Regards,
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
I tried the indicator column method and it worked after I changed the lookup failure constraint to 'Continue' which was 'Reject' before. I am able to capture unmatched records afterwards using a transformer as you mentioned. Thanks for your help James and everyone else who took time to reply.jwiles wrote: If you use the indicator columns method I mentioned previously, then the lookups will be easy to setup and transformer logic is pretty straightforward (depending on what you need to do for non-matching codes).
In my project we are also using more output links for lkp stage and we are not facing any performance issues and important thing is our data volume is less .
Can some one help on this.
Ray & chulett :- I have a question on this , In some of the project i seen that they are using more than 20 stage in one job is any limit to use datastage stages.
I am waiting for your reply,This post is already resolved, I feel My question is relevant to this matter.
Regards,
Sree
Can some one help on this.
Ray & chulett :- I have a question on this , In some of the project i seen that they are using more than 20 stage in one job is any limit to use datastage stages.
I am waiting for your reply,This post is already resolved, I feel My question is relevant to this matter.
Regards,
Sree