Number of reference links limit

brupun · Post by **brupun** » Tue Jun 19, 2012 12:36 pm

Hi All,

Is there a limit on the number of reference links for a lookup stage?

I may have to reference around 70 lookup tables in one lookup stage, so was wondering if there is a limit and if there will be any performance issues.

tbharathkumar · Post by **tbharathkumar** » Tue Jun 19, 2012 12:50 pm

when there is data growth, adversely affect the execution speed. Lookup can have N+1 input links.

ray.wurlod · Post by **ray.wurlod** » Tue Jun 19, 2012 4:46 pm

brupun wrote:I may have to reference around 70 lookup tables in one lookup stage.

No you won't.
There's no reason not to use multiple Lookup stages. You can make the downstream lookups conditional on the success of the upstream ones if that is your wont.

brupun · Post by **brupun** » Fri Jun 22, 2012 8:20 am

Using one lookup stage did not give desired results, but then again to lookup around 60-70 codes, will I need to use that many lookup stages? How badly will it affect the job's performance?

chulett · Post by **chulett** » Fri Jun 22, 2012 9:22 am

There's no difference between doing X lookups in 1 stage versus doing 1 lookup in X stages, other than the 'clutter' on the canvas. If you didn't get the 'desired result' all clumped into one, I doubt anything will change if you separate them out.

If you are looking for help with getting your desired result, you have to first explain what that might be - what the result was that you got compared to what you wanted.

jwiles · Post by **jwiles** » Fri Jun 22, 2012 9:29 am

We can't predict the performance effect for you. However, what you may gain through using multiple lookup stages is easier maintenance of the lookups (fewer lookups per stage is easier for me to manage, your experience may be different) as well as the potential for better pipeline parallelism as one stage passes data to the next and then processes another piece of data.

As mentioned, look at the relationship between lookups...are some dependent on the results of others? If so, ensure that they are executed in the correct order. Perhaps some of the reference tables can be combined into a single lookup (same keys for source and reference).

Regards,

brupun · Post by **brupun** » Fri Jun 22, 2012 9:43 am

What I'm trying to achieve is, validate around 60-70 code columns in the source dataset by looking up in reference code tables. Code columns in source are nullable. I need to capture records where there was no match for codes but at the same time continue passing those records forward to the next stage.

Now if I do X lookups in 1 stage, I'm not able to capture unmatched recs for each code column. Doing 1 lookup in X stages helps my requirement. I was worried about the effect on performance in using multiple lookup stages on 1 canvas.

jwiles · Post by **jwiles** » Fri Jun 22, 2012 10:06 am

Now if I do X lookups in 1 stage, I'm not able to capture unmatched recs for each code column.

You can easily do so: have each code lookup simply add an indicator column (tinyint) with a value of 1 (hardcoded in the reference table) when it matches, null or equal to 0 when there's not a match. Examine the indicator columns in a transformer after the lookups are complete and handle the match/nomatch results accordingly.

Regards,

brupun · Post by **brupun** » Fri Jun 22, 2012 10:21 am

jwiles wrote:We can't predict the performance effect for you. However, what you may gain through using multiple lookup stages is easier maintenance of the lookups (fewer lookups per stage is easier for me to manage, your experience may be different) as well as the potential for better pipeline parallelism as one stage passes data to the next and then processes another piece of data.

As mentioned, look at the relationship between lookups...are some dependent on the results of others? If so, ensure that they are executed in the correct order. Perhaps some of the reference tables can be combined into a single lookup (same keys for source and reference).

Fewer lookups per stage are definitely easier for me to manage too. In my case, these lookups are not dependent on each other, I'm going to create copy of source dataset and perform all lookups in parallel.

jwiles · Post by **jwiles** » Fri Jun 22, 2012 11:00 am

Allow DataStage to handle the parallelism for you by running the job in parallel (multiple logical nodes in your configuration file), but I would recommend combining lookup stages at a reasonable level...6 or 7 lookups per stage at most is comfortable for me. If you use the indicator columns method I mentioned previously, then the lookups will be easy to setup and transformer logic is pretty straightforward (depending on what you need to do for non-matching codes).

Regards,

brupun · Post by **brupun** » Fri Jun 22, 2012 11:40 am

jwiles wrote: If you use the indicator columns method I mentioned previously, then the lookups will be easy to setup and transformer logic is pretty straightforward (depending on what you need to do for non-matching codes).

I tried the indicator column method and it worked after I changed the lookup failure constraint to 'Continue' which was 'Reject' before. I am able to capture unmatched records afterwards using a transformer as you mentioned. Thanks for your help James and everyone else who took time to reply.

sreewin7 · Post by **sreewin7** » Tue Jun 26, 2012 6:13 pm

In my project we are also using more output links for lkp stage and we are not facing any performance issues and important thing is our data volume is less .

Can some one help on this.
Ray & chulett :- I have a question on this , In some of the project i seen that they are using more than 20 stage in one job is any limit to use datastage stages.
I am waiting for your reply,This post is already resolved, I feel My question is relevant to this matter.

Regards,
Sree

Kryt0n · Post by **Kryt0n** » Tue Jun 26, 2012 6:48 pm

I'm no Ray or Craig but it would only be as limited as the resources available. If the server handles the job, where's the problem?

20 stages does not necessarily mean a more complex or resource intensive job

chulett · Post by **chulett** » Tue Jun 26, 2012 8:23 pm

As noted, there is no DataStage limit to speak of... but you always have to consider the underlying resource availability.

sreewin7 · Post by **sreewin7** » Wed Jun 27, 2012 10:20 am

Thanks for your reply, my job is working fine, my question is is any stages limit in ds jobs.

DSXchange

Number of reference links limit

Number of reference links limit

Re: Number of reference links limit

Re: Number of reference links limit

Re: Number of reference links limit