Page 1 of 2

Number of reference links limit

Posted: Tue Jun 19, 2012 12:36 pm
by brupun
Hi All,

Is there a limit on the number of reference links for a lookup stage?

I may have to reference around 70 lookup tables in one lookup stage, so was wondering if there is a limit and if there will be any performance issues.

Re: Number of reference links limit

Posted: Tue Jun 19, 2012 12:50 pm
by tbharathkumar
when there is data growth, adversely affect the execution speed. Lookup can have N+1 input links.

Re: Number of reference links limit

Posted: Tue Jun 19, 2012 4:46 pm
by ray.wurlod
brupun wrote:I may have to reference around 70 lookup tables in one lookup stage.
No you won't.
There's no reason not to use multiple Lookup stages. You can make the downstream lookups conditional on the success of the upstream ones if that is your wont.

Re: Number of reference links limit

Posted: Fri Jun 22, 2012 8:20 am
by brupun
Using one lookup stage did not give desired results, but then again to lookup around 60-70 codes, will I need to use that many lookup stages? How badly will it affect the job's performance?

Posted: Fri Jun 22, 2012 9:22 am
by chulett
There's no difference between doing X lookups in 1 stage versus doing 1 lookup in X stages, other than the 'clutter' on the canvas. If you didn't get the 'desired result' all clumped into one, I doubt anything will change if you separate them out.

If you are looking for help with getting your desired result, you have to first explain what that might be - what the result was that you got compared to what you wanted.

Posted: Fri Jun 22, 2012 9:29 am
by jwiles
We can't predict the performance effect for you. However, what you may gain through using multiple lookup stages is easier maintenance of the lookups (fewer lookups per stage is easier for me to manage, your experience may be different) as well as the potential for better pipeline parallelism as one stage passes data to the next and then processes another piece of data.

As mentioned, look at the relationship between lookups...are some dependent on the results of others? If so, ensure that they are executed in the correct order. Perhaps some of the reference tables can be combined into a single lookup (same keys for source and reference).

Regards,

Posted: Fri Jun 22, 2012 9:43 am
by brupun
What I'm trying to achieve is, validate around 60-70 code columns in the source dataset by looking up in reference code tables. Code columns in source are nullable. I need to capture records where there was no match for codes but at the same time continue passing those records forward to the next stage.

Now if I do X lookups in 1 stage, I'm not able to capture unmatched recs for each code column. Doing 1 lookup in X stages helps my requirement. I was worried about the effect on performance in using multiple lookup stages on 1 canvas.

Posted: Fri Jun 22, 2012 10:06 am
by jwiles
Now if I do X lookups in 1 stage, I'm not able to capture unmatched recs for each code column.
You can easily do so: have each code lookup simply add an indicator column (tinyint) with a value of 1 (hardcoded in the reference table) when it matches, null or equal to 0 when there's not a match. Examine the indicator columns in a transformer after the lookups are complete and handle the match/nomatch results accordingly.

Regards,

Posted: Fri Jun 22, 2012 10:21 am
by brupun
jwiles wrote:We can't predict the performance effect for you. However, what you may gain through using multiple lookup stages is easier maintenance of the lookups (fewer lookups per stage is easier for me to manage, your experience may be different) as well as the potential for better pipeline parallelism as one stage passes data to the next and then processes another piece of data.

As mentioned, look at the relationship between lookups...are some dependent on the results of others? If so, ensure that they are executed in the correct order. Perhaps some of the reference tables can be combined into a single lookup (same keys for source and reference).
Fewer lookups per stage are definitely easier for me to manage too. In my case, these lookups are not dependent on each other, I'm going to create copy of source dataset and perform all lookups in parallel.

Posted: Fri Jun 22, 2012 11:00 am
by jwiles
Allow DataStage to handle the parallelism for you by running the job in parallel (multiple logical nodes in your configuration file), but I would recommend combining lookup stages at a reasonable level...6 or 7 lookups per stage at most is comfortable for me. If you use the indicator columns method I mentioned previously, then the lookups will be easy to setup and transformer logic is pretty straightforward (depending on what you need to do for non-matching codes).

Regards,

Posted: Fri Jun 22, 2012 11:40 am
by brupun
jwiles wrote: If you use the indicator columns method I mentioned previously, then the lookups will be easy to setup and transformer logic is pretty straightforward (depending on what you need to do for non-matching codes).
I tried the indicator column method and it worked after I changed the lookup failure constraint to 'Continue' which was 'Reject' before. I am able to capture unmatched records afterwards using a transformer as you mentioned. Thanks for your help James and everyone else who took time to reply.

Posted: Tue Jun 26, 2012 6:13 pm
by sreewin7
In my project we are also using more output links for lkp stage and we are not facing any performance issues and important thing is our data volume is less .

Can some one help on this.
Ray & chulett :- I have a question on this , In some of the project i seen that they are using more than 20 stage in one job is any limit to use datastage stages.
I am waiting for your reply,This post is already resolved, I feel My question is relevant to this matter.

Regards,
Sree

Posted: Tue Jun 26, 2012 6:48 pm
by Kryt0n
I'm no Ray or Craig but it would only be as limited as the resources available. If the server handles the job, where's the problem?

20 stages does not necessarily mean a more complex or resource intensive job

Posted: Tue Jun 26, 2012 8:23 pm
by chulett
As noted, there is no DataStage limit to speak of... but you always have to consider the underlying resource availability.

Posted: Wed Jun 27, 2012 10:20 am
by sreewin7
Thanks for your reply, my job is working fine, my question is is any stages limit in ds jobs.