Partitioning in Lookup Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
patonp
Premium Member
Premium Member
Posts: 110
Joined: Thu Mar 11, 2004 7:59 am
Location: Toronto, ON

Partitioning in Lookup Stage

Post by patonp »

Despite reading through a large number of posts as well as the DS documentation, I'm still not entirely clear on some aspects of partitioning for lookup stages.

I'd like to perform a lookup against a text file and have chosen to use a lookup stage. By default, the "Auto" option is selected as the partition type for both the primary and reference streams. My questions are as follows:

1. With "Auto" selected for both the primary and reference streams, can I rely on DataStage to manage partitioning such that records will successfully be retrieved based on the Key Expression defined in the "main" lookup screen? (I want to make sure that DS will reference the correct partition for the lookup record.)

2. If the partition type is set to "Hash" in the primary stream, will setting the partition type and keys the same way in the reference stream also ensure that the lookup is successful? How is this better or worse than using the "Auto" setting?

3. If the partition type is set to "Entire" for the reference stream, does the partition type of the primary stream matter?

4. Apart from the Partition Key, there are two other types of keys that appear in the Lookup stage: the "Key" column defined in the metadata for a column and the Key Expression in the lookup. Do either of these have any impact on partitioning?

I realize that this is a long post, but I would appreciate any help you can provide.

Peter
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

1. Most probably yes (as per documentation), but If I were you, I wont take risk to leave Datastage to decide upon the partition. I ll do it manually.

2. Yes if both the streams are partitioned on HASH on the lookup key, and if you are sure that both will be partitioned exactly the same, it will work. There may be some cases where, Char type in one stream and Varchar type in other stream, which will lead to mismatch.

3. Thats why, "Entire" is preferred. It can match any partition type from Input stream. Be aware that, "Entire" in the input stream will create dulplicates.

4. Partitioning on the Lookup key will make the right choice.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For a Lookup stage, "Auto" selects:
  • Round Robin for the stream input link

    Entire for the reference input link(s)
The latter can have performance implications in a cluster/grid configuration, as all rows need to be moved to all nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
patonp
Premium Member
Premium Member
Posts: 110
Joined: Thu Mar 11, 2004 7:59 am
Location: Toronto, ON

Post by patonp »

Thanks to both of you for your quick (and helpful) responses.

Ray - I do have one follow-up question...

Your response suggests that "Auto" can mean different things in different stages. This certainly makes sense considering the various functions performed. In which DataStage document can I find out the behaviour of the "Auto" setting for each stage type? (I was looking through the documentation earlier today, and don't remember seeing this behaviour explicitly described for the lookup stage, though I was probably looking in the wrong place.)
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

There is not document given for each stages, but you can find you self by producing the scores. Set APT_DUMP_SCORE in your job and look for the scores, it will show you the details.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's taught in the IBM Advanced DataStage class. Therefore it's documented in the training manual. I'm not aware that it's documented in any of the PDF manuals that ship with the product.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
patonp
Premium Member
Premium Member
Posts: 110
Joined: Thu Mar 11, 2004 7:59 am
Location: Toronto, ON

Post by patonp »

Thanks again!
Post Reply