When should I use "Entire" partitioning in a looku

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pattemk
Participant
Posts: 84
Joined: Wed May 16, 2007 4:04 pm

When should I use "Entire" partitioning in a looku

Post by pattemk »

Hi,

Is it mandatory to specify entire as the partitioning method when using normal lookup. Is there any chance of losing data if we just leave it as auto partitioning?

Kindly advice

Thanks

**Note: Subject made more descriptive - Content Editor **
betterthanever
Participant
Posts: 152
Joined: Tue Jan 13, 2009 8:59 am

Re: lookup

Post by betterthanever »

no it is not mandatory
pattemk
Participant
Posts: 84
Joined: Wed May 16, 2007 4:04 pm

Re: lookup

Post by pattemk »

[quote="betterthanever"]no it is not mandatory[/quote]

Thanks for your prompt reply, a quick question.

what would be the cases or scenarios where we must specify entire partitiong method when doing normal lookup?

Kindly advice

Thanks
betterthanever
Participant
Posts: 152
Joined: Tue Jan 13, 2009 8:59 am

Re: lookup

Post by betterthanever »

i don't think any
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Specify entire partitioning whenever you are unable or unwilling to partition the reference input exactly the same as the stream input. One example of unable that I can think of: multiple reference links into a single lookup stage where the stream input can only be partitioned to match one of the reference inputs. An example of unwilling: a very small reference table where you don't want the overhead of repartitioning the stream input.

Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

On a single machine ("SMP" environment) you may as well use Entire, because it comes at no cost, via shared memory.

In a multiple machine environment ("MPP", cluster, grid) there can be a substantial cost moving records to all nodes, so you tend to avoid Entire (other than for small Data Sets) and use the same key-based partitioning as is used for the stream input.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pattemk
Participant
Posts: 84
Joined: Wed May 16, 2007 4:04 pm

Post by pattemk »

[quote="Mike"]Specify entire partitioning whenever you are unable or unwilling to partition the reference input exactly the same as the stream input. One example of unable that I can think of: multiple reference links into a single lookup stage where the stream input can only be partitioned to match one of the reference inputs. An example of unwilling: a very small reference table where you don't want the overhead of repartitioning the stream input.

Mike[/quote]

Thanks for your prompt reply.
my reference data is very small, i believe specifying entire will not result in lose of data or performance, so i am assuming specifying entire is better practice and mostly like mandatory when doing normal lokup with small reference data.
Post Reply