Problem with entire partitioning in lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Problem with entire partitioning in lookup

Post by RAJEEV KATTA »

When I select entire partitioning in lookup and I have got 4 nodes with 4 matching rows I get the ouptut of 16 rows as it process all the data in each node.How to avoid this and get only 4 records.
rjhcc
Premium Member
Premium Member
Posts: 34
Joined: Thu Jan 27, 2005 4:20 pm

Re: Problem with entire partitioning in lookup

Post by rjhcc »

RAJEEV KATTA wrote:When I select entire partitioning in lookup and I have got 4 nodes with 4 matching rows I get the ouptut of 16 rows as it process all the data in each node.How to avoid this and get only 4 records.
select round robin.....
rjhcc
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Post by RAJEEV KATTA »

But the best partitioning is entire for lookup.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The best partitioning for lookup is identical partitioning to the stream input.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

if the lookup stream is entire partitioned, then choose the main stream as auto, not as entire partition( i guess you have choose both input stram as enite and thats why your are getting 16 rows)
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

ray.wurlod wrote:The best partitioning for lookup is identical partitioning to the stream input. ...
Even when the input partitioning is round robin? ;)


Partition both the input and the reference links using the Hash partitioning method on the key columns. This should fix your problem of getting duplicates in the output.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I should have aid identical key partitioning. This implies either Hash or Modulus as the partitioning algorithm.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply