single lookup or multiple lookup
Moderators: chulett, rschirm, roy
single lookup or multiple lookup
Hi all,
i have a job in which i use six lookups with each oracle enterprise stage to do the join. But the record count is very less in each stage. So one lookup stage itself will do fine. but any performance problem if i use 6 lookups where 1 is sufficient. I know for big reference data in all 6 tables we have to go for separate lookups but here i used 6 separate ones. All auto partitioned. Any performance difference will be there? Should i change to one lookup in case of less data in all the six tables or nothing wrong if i proceed with 6 lookups.
Thanks in advance.
i have a job in which i use six lookups with each oracle enterprise stage to do the join. But the record count is very less in each stage. So one lookup stage itself will do fine. but any performance problem if i use 6 lookups where 1 is sufficient. I know for big reference data in all 6 tables we have to go for separate lookups but here i used 6 separate ones. All auto partitioned. Any performance difference will be there? Should i change to one lookup in case of less data in all the six tables or nothing wrong if i proceed with 6 lookups.
Thanks in advance.
Re: single lookup or multiple lookup
Hi ray,chulett,jwiles
Please help regarding this query.
Thanks in advance
Please help regarding this query.
Thanks in advance
Just for the record, I have a pet peeve - when someone calls out specific people when they post. There are upwards of 30,000 people registered here and any one of them can help.
Also, have some goram patience. You posted at 1AM my time and then at 4AM bumped it again. Sheesh.
Also, have some goram patience. You posted at 1AM my time and then at 4AM bumped it again. Sheesh.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It's very difficult to answer performance questions such as this. There are many factors that can influence job performance, both within and outside of DataStage and Information Server. We don't know your system and we don't know your data and we don't know your process. How can we give you a 100% correct answer for your situation?
Which design will work better for you? I don't know. You probably won't see much difference for the scenario you described. What is important is that either design is equally valid and will work. Eric gave the best suggestion: try it both ways That way you will know for certain in your environment. Then, if there are noticeable differences, explore further...where's the difference: startup? processing? Next thing you know, you're doing performance tuning...
I have to agree with Craig's comment: Don't call us out by name to solicit answers...give everyone here a chance. If it's "urgent-gotta-have-an-answer-within-3-hours-at-4am", call your official support organization.
Regards,
Which design will work better for you? I don't know. You probably won't see much difference for the scenario you described. What is important is that either design is equally valid and will work. Eric gave the best suggestion: try it both ways That way you will know for certain in your environment. Then, if there are noticeable differences, explore further...where's the difference: startup? processing? Next thing you know, you're doing performance tuning...
I have to agree with Craig's comment: Don't call us out by name to solicit answers...give everyone here a chance. If it's "urgent-gotta-have-an-answer-within-3-hours-at-4am", call your official support organization.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
I understand your point. Apologises for that. I dint say its urgent. I just asked help me in this query. I wont use names anymore. But sometimes questions goes unnoticed as so many posts are there in the queue. So i just send a reminder in common.
Regarding the query
I will check it out. Please tell the following understanding is correct.
Adding five more lookup stages will need extra resources or not? Certainly there will be right? But considering the pipelining concept if we have separate lookups it will process the records faster. I run in 4 nodes. For all the six reference tables,there are totally 50 records with 7 columns (integer), which in future may reach max of 100 records. My question is, addition of lookups result in what, regarding resources and players and memory. Please explain this, then i will conclude according to my scenerio.
Regarding the query
I will check it out. Please tell the following understanding is correct.
Adding five more lookup stages will need extra resources or not? Certainly there will be right? But considering the pipelining concept if we have separate lookups it will process the records faster. I run in 4 nodes. For all the six reference tables,there are totally 50 records with 7 columns (integer), which in future may reach max of 100 records. My question is, addition of lookups result in what, regarding resources and players and memory. Please explain this, then i will conclude according to my scenerio.
Sometimes questions will go unanswered...that's the nature of this type of forum. We're all volunteers and answer if we have time and answers that might be helpful.
Yes, adding additional stages requires more resources, as does a single stage with multiple reference tables (and hence lookups). For multiple lookup stages, the engine may combine them if they're next to each other and operator combination is not disabled, at which point it's effectively the same as a single stage. Each running process (player) requires system resources (memory/communication links/buffers/etc), hence the operator combination options. Can't be effectively taught in a single thread. If you haven't, I suggest you read the IBM Redbooks on DataStage Performance Tuning and Dataflow Design. There are several links to them in the forum.
You'll have an amount of memory used for each table, however in your scenario the total number of reference records is too small for this to be a problem. If you're using individual lookup stages, and they're not combined, you may see some benefit from pipeline parallelism. Again, test to see how your system responds with your overall job design. You may find that a downstream process in your job negates any efficiency upstream (that is the job will run no faster than it's slowest part).
Regards,
Yes, adding additional stages requires more resources, as does a single stage with multiple reference tables (and hence lookups). For multiple lookup stages, the engine may combine them if they're next to each other and operator combination is not disabled, at which point it's effectively the same as a single stage. Each running process (player) requires system resources (memory/communication links/buffers/etc), hence the operator combination options. Can't be effectively taught in a single thread. If you haven't, I suggest you read the IBM Redbooks on DataStage Performance Tuning and Dataflow Design. There are several links to them in the forum.
You'll have an amount of memory used for each table, however in your scenario the total number of reference records is too small for this to be a problem. If you're using individual lookup stages, and they're not combined, you may see some benefit from pipeline parallelism. Again, test to see how your system responds with your overall job design. You may find that a downstream process in your job negates any efficiency upstream (that is the job will run no faster than it's slowest part).
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: