Page 1 of 1

Need opinion about this scenario...

Posted: Fri Jul 13, 2012 3:11 pm
by kaps
I am designing a job where I need to get all my date keys from the date dimension table. Input data is considerably small so using lookup stage rather than join stage and db2 connector stage to read the table and passing it through a copy stage to multiple links to get all date keys say 14 links.

1. My understanding is that the data from the table is copied to all links rather than sparse lookup as we use using copy stage. Correct ? When we have a copy stage between db2 stage and lookup stage the lookup type operation goes away hence the question.
2. Is this a good approach or any other better approach ?
3. I heard from someone saying that IBM's best practice is not to use more than 8 active stages in a job but does it matter when the data goes through daily is less than 5K.

Thanks

Posted: Fri Jul 13, 2012 4:30 pm
by ray.wurlod
1. This depends on the partitioning algorithm on the reference input. Auto=Entire, which means all rows on all partitions.

2. You can hash or modulus partition on the reference input using identical criteria as used on the stream input. In an SMP environment this doesn't achieve much, because Entire only uses one copy, in shared memory.

3. Get "someone" to provide proof. I have designed successful jobs with substantially more active stages than 8. Please post that alleged proof.

Posted: Wed Jul 18, 2012 8:20 am
by kaps
Ray
Thanks for the reply. "someone" is a consultant and he is not here anymore and I am going to ignore that advise.

Posted: Wed Jul 18, 2012 6:11 pm
by ray.wurlod
Ah. Consultant, derived from "con" and "insult".
:wink: