why lookup reference link requires entire partition method?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ramesh_inform
Participant
Posts: 57
Joined: Mon Dec 03, 2007 12:43 am
Location: hyderabad

why lookup reference link requires entire partition method?

Post by ramesh_inform »

why lookup reference link requires entire partition method and the master link ,hash partitioning method?
ramesh.n.
bkumar103
Participant
Posts: 214
Joined: Wed Jul 25, 2007 2:29 am
Location: Chennai

Post by bkumar103 »

The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.
bkumar103
Participant
Posts: 214
Joined: Wed Jul 25, 2007 2:29 am
Location: Chennai

Post by bkumar103 »

The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: why lookup reference link requires entire partition meth

Post by ray.wurlod »

ramesh_inform wrote:why lookup reference link requires entire partition method and the master link ,hash partitioning method?
They don't. These are merely the defaults, which are guaranteed to give correct results.

Another possibility is that the lookup key is a single integer. In that case the stream input could use the Modulus partitioning algorithm, if that gives more even spread of rows over nodes.

Similarly, it is legitimate (and even desirable in multiple machine configurations) to partition the reference input identically to the stream input, to avoid distributing all rows to all nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ramesh_inform
Participant
Posts: 57
Joined: Mon Dec 03, 2007 12:43 am
Location: hyderabad

Post by ramesh_inform »

thanks bkumar
ramesh.n.
sunayan_pal
Participant
Posts: 49
Joined: Fri May 11, 2007 12:24 am
Location: kolkata

Post by sunayan_pal »

bkumar103 wrote:The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.
could u please confirm weather the lookup data get copied to all the partitions or data from all the partition are poll to memory for lookup.
regards
sunayan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In an SMP configuration one copy exists in shared memory.

In an MPP or grid configuration the rows are moved to every node.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sunayan_pal
Participant
Posts: 49
Joined: Fri May 11, 2007 12:24 am
Location: kolkata

Post by sunayan_pal »

bkumar103 wrote:The parallel jobs leverage the paralellism by using the pipelining and partitining mechanism. During partitioning the Reference records might be distributed accross the nodes. But if the input records are looked up to the wrong node then the output might be problemetic. Entire partition copy the complete set of records on each and every node. Which ensures the proper lookup(Does not matter on which node input records are) and proper output.

Hope this makes sense.
could u please confirm weather the lookup data get copied to all the partitions or data from all the partition are poll to memory for lookup.
regards
sunayan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I just did, eight minutes prior to you re-posting your question. Was there anything about my answer that was unclear?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sunayan_pal
Participant
Posts: 49
Joined: Fri May 11, 2007 12:24 am
Location: kolkata

Post by sunayan_pal »

ray.wurlod wrote:I just did, eight minutes prior to you re-posting your question. Was there anything about my answer that was unclear? ...
my apology for posting it twice. 2 sessions were open by that time and i submit in both the session which get reflected in this.
regards
sunayan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Attention to detail is one of the principal characteristics necessary in a DataStage developer.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply