Page 1 of 2

single lookup or multiple lookup

Posted: Mon Sep 19, 2011 1:14 am
by dsscholar
Hi all,

i have a job in which i use six lookups with each oracle enterprise stage to do the join. But the record count is very less in each stage. So one lookup stage itself will do fine. but any performance problem if i use 6 lookups where 1 is sufficient. I know for big reference data in all 6 tables we have to go for separate lookups but here i used 6 separate ones. All auto partitioned. Any performance difference will be there? Should i change to one lookup in case of less data in all the six tables or nothing wrong if i proceed with 6 lookups.

Thanks in advance.

Re: single lookup or multiple lookup

Posted: Mon Sep 19, 2011 4:10 am
by dsscholar
Hi ray,chulett,jwiles

Please help regarding this query.

Thanks in advance

Posted: Mon Sep 19, 2011 7:21 am
by chulett
Just for the record, I have a pet peeve - when someone calls out specific people when they post. There are upwards of 30,000 people registered here and any one of them can help.

Also, have some goram patience. You posted at 1AM my time and then at 4AM bumped it again. Sheesh.

Posted: Mon Sep 19, 2011 1:05 pm
by dsscholar
I accept your point. But if you had answered the query along with this means, it would have been better.


Thanks in advance

Posted: Mon Sep 19, 2011 1:28 pm
by qt_ky
Try it both ways.

Posted: Mon Sep 19, 2011 3:38 pm
by ray.wurlod
5:14pm and 8:10pm in my time zone.
Do you require that we be on line 24x7? How much would you be prepared to pay for that?!!

Posted: Mon Sep 19, 2011 4:42 pm
by jwiles
It's very difficult to answer performance questions such as this. There are many factors that can influence job performance, both within and outside of DataStage and Information Server. We don't know your system and we don't know your data and we don't know your process. How can we give you a 100% correct answer for your situation?

Which design will work better for you? I don't know. You probably won't see much difference for the scenario you described. What is important is that either design is equally valid and will work. Eric gave the best suggestion: try it both ways That way you will know for certain in your environment. Then, if there are noticeable differences, explore further...where's the difference: startup? processing? Next thing you know, you're doing performance tuning...

I have to agree with Craig's comment: Don't call us out by name to solicit answers...give everyone here a chance. If it's "urgent-gotta-have-an-answer-within-3-hours-at-4am", call your official support organization.

Regards,

Posted: Mon Sep 19, 2011 8:37 pm
by dsscholar
I understand your point. Apologises for that. I dint say its urgent. I just asked help me in this query. I wont use names anymore. But sometimes questions goes unnoticed as so many posts are there in the queue. So i just send a reminder in common.

Regarding the query

I will check it out. Please tell the following understanding is correct.

Adding five more lookup stages will need extra resources or not? Certainly there will be right? But considering the pipelining concept if we have separate lookups it will process the records faster. I run in 4 nodes. For all the six reference tables,there are totally 50 records with 7 columns (integer), which in future may reach max of 100 records. My question is, addition of lookups result in what, regarding resources and players and memory. Please explain this, then i will conclude according to my scenerio.

Posted: Mon Sep 19, 2011 9:02 pm
by SURA
If some of the tables are not so big and have relations (PK-FK) , then use join in query level and the rest using Datastage.

DS User

Posted: Mon Sep 19, 2011 9:30 pm
by dsscholar
source & references are different dbms.

Posted: Mon Sep 19, 2011 9:33 pm
by SURA
Better do it is DS level and you wont find much impact if those tables are indexed.

Do lookup and write it in a Dataset file. In the next job load it will give better result.

DS User

Posted: Mon Sep 19, 2011 10:00 pm
by jwiles
Sometimes questions will go unanswered...that's the nature of this type of forum. We're all volunteers and answer if we have time and answers that might be helpful.

Yes, adding additional stages requires more resources, as does a single stage with multiple reference tables (and hence lookups). For multiple lookup stages, the engine may combine them if they're next to each other and operator combination is not disabled, at which point it's effectively the same as a single stage. Each running process (player) requires system resources (memory/communication links/buffers/etc), hence the operator combination options. Can't be effectively taught in a single thread. If you haven't, I suggest you read the IBM Redbooks on DataStage Performance Tuning and Dataflow Design. There are several links to them in the forum.

You'll have an amount of memory used for each table, however in your scenario the total number of reference records is too small for this to be a problem. If you're using individual lookup stages, and they're not combined, you may see some benefit from pipeline parallelism. Again, test to see how your system responds with your overall job design. You may find that a downstream process in your job negates any efficiency upstream (that is the job will run no faster than it's slowest part).

Regards,

Posted: Mon Sep 19, 2011 10:04 pm
by chulett
jwiles wrote:Sometimes questions will go unanswered...that's the nature of this type of forum. We're all volunteers and answer if we have time and answers that might be helpful.
Exactly. Unanswered, perhaps. Unnoticed? Not a chance.

Posted: Mon Sep 19, 2011 11:09 pm
by ray.wurlod
Sometimes we're even expected to do the work we're paid to do!
:roll:

Posted: Tue Sep 20, 2011 12:04 am
by dsscholar
Thanks jwiles.

Hi all,

Where can i enable or disable operator combination option?