Number of reference links limit

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

brupun
Premium Member
Premium Member
Posts: 16
Joined: Thu Jun 14, 2012 10:10 am

Number of reference links limit

Post by brupun »

Hi All,

Is there a limit on the number of reference links for a lookup stage?

I may have to reference around 70 lookup tables in one lookup stage, so was wondering if there is a limit and if there will be any performance issues.
tbharathkumar
Participant
Posts: 26
Joined: Mon Aug 27, 2007 6:27 am
Location: Des Moines

Re: Number of reference links limit

Post by tbharathkumar »

when there is data growth, adversely affect the execution speed. Lookup can have N+1 input links.
Regards,
Bharath Tipirisetty
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Number of reference links limit

Post by ray.wurlod »

brupun wrote:I may have to reference around 70 lookup tables in one lookup stage.
No you won't.
There's no reason not to use multiple Lookup stages. You can make the downstream lookups conditional on the success of the upstream ones if that is your wont.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
brupun
Premium Member
Premium Member
Posts: 16
Joined: Thu Jun 14, 2012 10:10 am

Re: Number of reference links limit

Post by brupun »

Using one lookup stage did not give desired results, but then again to lookup around 60-70 codes, will I need to use that many lookup stages? How badly will it affect the job's performance?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no difference between doing X lookups in 1 stage versus doing 1 lookup in X stages, other than the 'clutter' on the canvas. If you didn't get the 'desired result' all clumped into one, I doubt anything will change if you separate them out.

If you are looking for help with getting your desired result, you have to first explain what that might be - what the result was that you got compared to what you wanted.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

We can't predict the performance effect for you. However, what you may gain through using multiple lookup stages is easier maintenance of the lookups (fewer lookups per stage is easier for me to manage, your experience may be different) as well as the potential for better pipeline parallelism as one stage passes data to the next and then processes another piece of data.

As mentioned, look at the relationship between lookups...are some dependent on the results of others? If so, ensure that they are executed in the correct order. Perhaps some of the reference tables can be combined into a single lookup (same keys for source and reference).

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
brupun
Premium Member
Premium Member
Posts: 16
Joined: Thu Jun 14, 2012 10:10 am

Post by brupun »

What I'm trying to achieve is, validate around 60-70 code columns in the source dataset by looking up in reference code tables. Code columns in source are nullable. I need to capture records where there was no match for codes but at the same time continue passing those records forward to the next stage.

Now if I do X lookups in 1 stage, I'm not able to capture unmatched recs for each code column. Doing 1 lookup in X stages helps my requirement. I was worried about the effect on performance in using multiple lookup stages on 1 canvas.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Now if I do X lookups in 1 stage, I'm not able to capture unmatched recs for each code column.
You can easily do so: have each code lookup simply add an indicator column (tinyint) with a value of 1 (hardcoded in the reference table) when it matches, null or equal to 0 when there's not a match. Examine the indicator columns in a transformer after the lookups are complete and handle the match/nomatch results accordingly.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
brupun
Premium Member
Premium Member
Posts: 16
Joined: Thu Jun 14, 2012 10:10 am

Post by brupun »

jwiles wrote:We can't predict the performance effect for you. However, what you may gain through using multiple lookup stages is easier maintenance of the lookups (fewer lookups per stage is easier for me to manage, your experience may be different) as well as the potential for better pipeline parallelism as one stage passes data to the next and then processes another piece of data.

As mentioned, look at the relationship between lookups...are some dependent on the results of others? If so, ensure that they are executed in the correct order. Perhaps some of the reference tables can be combined into a single lookup (same keys for source and reference).
Fewer lookups per stage are definitely easier for me to manage too. In my case, these lookups are not dependent on each other, I'm going to create copy of source dataset and perform all lookups in parallel.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Allow DataStage to handle the parallelism for you by running the job in parallel (multiple logical nodes in your configuration file), but I would recommend combining lookup stages at a reasonable level...6 or 7 lookups per stage at most is comfortable for me. If you use the indicator columns method I mentioned previously, then the lookups will be easy to setup and transformer logic is pretty straightforward (depending on what you need to do for non-matching codes).

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
brupun
Premium Member
Premium Member
Posts: 16
Joined: Thu Jun 14, 2012 10:10 am

Post by brupun »

jwiles wrote: If you use the indicator columns method I mentioned previously, then the lookups will be easy to setup and transformer logic is pretty straightforward (depending on what you need to do for non-matching codes).
I tried the indicator column method and it worked after I changed the lookup failure constraint to 'Continue' which was 'Reject' before. I am able to capture unmatched records afterwards using a transformer as you mentioned. Thanks for your help James and everyone else who took time to reply.
sreewin7
Participant
Posts: 41
Joined: Tue Sep 14, 2010 8:48 pm

Post by sreewin7 »

In my project we are also using more output links for lkp stage and we are not facing any performance issues and important thing is our data volume is less .

Can some one help on this.
Ray & chulett :- I have a question on this , In some of the project i seen that they are using more than 20 stage in one job is any limit to use datastage stages.
I am waiting for your reply,This post is already resolved, I feel My question is relevant to this matter.

Regards,
Sree
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

I'm no Ray or Craig but it would only be as limited as the resources available. If the server handles the job, where's the problem?

20 stages does not necessarily mean a more complex or resource intensive job
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, there is no DataStage limit to speak of... but you always have to consider the underlying resource availability.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sreewin7
Participant
Posts: 41
Joined: Tue Sep 14, 2010 8:48 pm

Post by sreewin7 »

Thanks for your reply, my job is working fine, my question is is any stages limit in ds jobs.
Post Reply