Page 1 of 1

why lookup reference link requires entire partition method?

Posted: Tue Apr 08, 2008 12:09 am
by ramesh_inform
why lookup reference link requires entire partition method and the master link ,hash partitioning method?

Posted: Tue Apr 08, 2008 12:52 am
by bkumar103
The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.

Posted: Tue Apr 08, 2008 12:52 am
by bkumar103
The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.

Re: why lookup reference link requires entire partition meth

Posted: Tue Apr 08, 2008 1:45 am
by ray.wurlod
ramesh_inform wrote:why lookup reference link requires entire partition method and the master link ,hash partitioning method?
They don't. These are merely the defaults, which are guaranteed to give correct results.

Another possibility is that the lookup key is a single integer. In that case the stream input could use the Modulus partitioning algorithm, if that gives more even spread of rows over nodes.

Similarly, it is legitimate (and even desirable in multiple machine configurations) to partition the reference input identically to the stream input, to avoid distributing all rows to all nodes.

Posted: Tue Apr 08, 2008 2:19 am
by ramesh_inform
thanks bkumar

Posted: Tue Apr 08, 2008 4:44 am
by sunayan_pal
bkumar103 wrote:The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.
could u please confirm weather the lookup data get copied to all the partitions or data from all the partition are poll to memory for lookup.

Posted: Tue Apr 08, 2008 4:59 am
by ray.wurlod
In an SMP configuration one copy exists in shared memory.

In an MPP or grid configuration the rows are moved to every node.

Posted: Tue Apr 08, 2008 5:07 am
by sunayan_pal
bkumar103 wrote:The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.
could u please confirm weather the lookup data get copied to all the partitions or data from all the partition are poll to memory for lookup.

Posted: Tue Apr 08, 2008 6:25 am
by ray.wurlod
I just did, eight minutes prior to you re-posting your question. Was there anything about my answer that was unclear?

Posted: Tue Apr 08, 2008 7:54 am
by sunayan_pal
ray.wurlod wrote:I just did, eight minutes prior to you re-posting your question. Was there anything about my answer that was unclear? ...
my apology for posting it twice. 2 sessions were open by that time and i submit in both the session which get reflected in this.

Posted: Tue Apr 08, 2008 3:03 pm
by ray.wurlod
Attention to detail is one of the principal characteristics necessary in a DataStage developer.