Partition while using lookup

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Partition while using lookup

Post by elavenil »

Hi,

We use PX in DS 6.0.1. We are using datasets as look ups in the lookup stage. Auto Partition method used in the lookup stage and we seem to get the right data but when we attended the PX training, we were told that 'Entire' partition method must be used in order to get lookup data from the lookup datasets.

Could anyone confirm this whether any partition can be used or only entire partition method must be used.

Thanks in advance.

Regards
Saravanan

Note: 4 nodes from Single server are used.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Data will still be partitioned. However, you have no way of knowing in advance in which partition your particular key will occur, so you have to use the Entire partitioning method so that the lookup can "see" the entire data set. PX will automatically look after ensuring that the retrieved row ends up in the correct partition for processing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: Partition while using lookup

Post by Teej »

We use PX in DS 6.0.1. We are using datasets as look ups in the lookup stage. Auto Partition method used in the lookup stage and we seem to get the right data but when we attended the PX training, we were told that 'Entire' partition method must be used in order to get lookup data from the lookup datasets.
That is one option. Unfortunately, it does not invoke the parallel lookup method. That is fixed for 7.0 (or 7.0.1, hazy memory right now).

You are recommended to use hash partitioning for both input and reference links in order to take advantage of parallel lookups.

The fix defaults the stage to hash for the provided key fields when you select auto.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
praj
Participant
Posts: 14
Joined: Sat Dec 20, 2003 12:46 am

Post by praj »

as Teej said its recom. to hv hash partitioning on the keys.
And i think its better to hv the lookup fields sorted(although its not necessary for DS) . U can use the perform sort checkbox for the same in partitioning sheet :) .
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

It is not recommended to sort. There is a known bug with the Lookup stage for 6.x that would crash the job if you attempt to sort the data with several conditions.

However, it is still not recommended to sort because that takes away the advantage of the lookup stage -- rapid lookup. You might as well use the join stage if you sort.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Thanks for your suggestions and detailed explanations.

I will use hash partition for input and reference links.

Regards
Saravanan
Post Reply