Page 1 of 1

Preserve Partitioning option when using multiple stages

Posted: Wed May 06, 2009 6:21 pm
by shankar_ramanath
I have a Sequential File stage as input that is connected with a Transformer stage.

From the Transformer stage, I need to connect to three output links.

The first one connects to a Lookup stage, the second one connects to a Funnel stage and the third one connects to a Filter stage.

The Funnel stage and Filter stage needs input data to be partitioned by the same set of keys and so I have partitioned the input data on the Transformer stage using these keys. This allows me to use "Same" parititioning for both Funnel and Filter stages.

However, the Lookup stage needs the input data to be partitioned on a different set of keys. Ideally, I would set the "Preserve Partitioning" flag to "Clear" in the Transformer stage so that the Lookup stage does not receive partitioned data. By doing so, the partitioning is cleared for Funnel and Filter stages also. This is not what I want since I cannot use "Same" partitioning for these two stages.

This is probably a naive question. Is there a way to selectively clear the partition for specific output links?

Thanks in advance!

Posted: Wed May 06, 2009 6:44 pm
by ray.wurlod
It's a good question, naive or not. The answer is no. Therefore you are going to wear the warning, and might choose to incorporate a message handler to demote it.

Posted: Thu May 07, 2009 7:08 am
by santhooosh.c
ray.wurlod wrote:It's a good question, naive or not. The answer is no. Therefore you are going to wear the warning, and might choose to incorporate a message handler to demote it. ...
You could choose to use copy stage in between Transformer and other stages
Change your partition type in copy stage between Transformer and Lookup, in other copy stage set enforce option to false

Posted: Thu May 07, 2009 11:37 am
by shankar_ramanath
Thanks Ray for your kind reply!

Hi Santosh,

This is what I ended up doing :)

I was wondering if there is an option to avoid the Copy stage. Looks like there is none.

Posted: Thu May 07, 2009 11:38 am
by shankar_ramanath
Thanks Ray for your kind reply!

Hi Santosh,

This is what I ended up doing :)

I was wondering if there is an option to avoid the Copy stage. Looks like there is none.

Posted: Thu May 07, 2009 1:49 pm
by throbinson
This implies you are using a partitioning scheme other than Entire in the Look-up stage. Is this really required?

Posted: Thu May 07, 2009 9:01 pm
by shankar_ramanath
Thanks throbinson!

From the transformer, there are two links that are sent out to the Lookup stage. One of them is a primary link and the other reference link. Since the input data to the transformer is partitioned using a set of keys, I am using "Same" partitioning for both. Since you mentioned about "Entire" I looked into "Parallel Job Developer Guide" and found the following:

"You need to ensure that the data being looked up in the lookup table is in the same partition as the input data referencing it. One way of doing this is to partition the lookup tables using the Entire method. Another way is to partition it in the same way as the input data (although this implies sorting of the data)."

So I guess I am exempt from using "Entire" for the look-up stage :wink: