why lookup reference link requires entire partition method?

ramesh_inform · Post by **ramesh_inform** » Tue Apr 08, 2008 12:09 am

why lookup reference link requires entire partition method and the master link ,hash partitioning method?

bkumar103 · Post by **bkumar103** » Tue Apr 08, 2008 12:52 am

The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.

bkumar103 · Post by **bkumar103** » Tue Apr 08, 2008 12:52 am

The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 08, 2008 1:45 am

ramesh_inform wrote:why lookup reference link requires entire partition method and the master link ,hash partitioning method?

They don't. These are merely the defaults, which are guaranteed to give correct results.

Another possibility is that the lookup key is a single integer. In that case the stream input could use the Modulus partitioning algorithm, if that gives more even spread of rows over nodes.

Similarly, it is legitimate (and even desirable in multiple machine configurations) to partition the reference input identically to the stream input, to avoid distributing all rows to all nodes.

ramesh_inform · Post by **ramesh_inform** » Tue Apr 08, 2008 2:19 am

thanks bkumar

sunayan_pal · Post by **sunayan_pal** » Tue Apr 08, 2008 4:44 am

bkumar103 wrote:The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.

could u please confirm weather the lookup data get copied to all the partitions or data from all the partition are poll to memory for lookup.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 08, 2008 4:59 am

In an SMP configuration one copy exists in shared memory.

In an MPP or grid configuration the rows are moved to every node.

sunayan_pal · Post by **sunayan_pal** » Tue Apr 08, 2008 5:07 am

bkumar103 wrote:The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.

could u please confirm weather the lookup data get copied to all the partitions or data from all the partition are poll to memory for lookup.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 08, 2008 6:25 am

I just did, eight minutes prior to you re-posting your question. Was there anything about my answer that was unclear?

sunayan_pal · Post by **sunayan_pal** » Tue Apr 08, 2008 7:54 am

ray.wurlod wrote:I just did, eight minutes prior to you re-posting your question. Was there anything about my answer that was unclear? ...

my apology for posting it twice. 2 sessions were open by that time and i submit in both the session which get reflected in this.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 08, 2008 3:03 pm

Attention to detail is one of the principal characteristics necessary in a DataStage developer.

DSXchange

why lookup reference link requires entire partition method?

why lookup reference link requires entire partition method?

Re: why lookup reference link requires entire partition meth