Page 1 of 1

Parameter values based Partition

Posted: Tue Sep 25, 2012 2:59 am
by elavenil
Hi,

We are using a template job to read input file and create an output datasets. Since template job is used, we will not be able to define partition based on key as key columns will be different for each input file and RCP is enabled in the job. This data needs to be repartitioned in the next job based on the downstream process, which uses the dataset created in the template job.

If Hash partition is created based on the parameter value, the column name can be passed as a parameter so that the same template job can be used to create this dataset with the hash partition with appropriate key(s).

This will significantly improve the development effort needed to handle such input files.

It is worth to consider this idea to enhance this product.

Regards
Elavenil

Posted: Tue Sep 25, 2012 3:32 am
by ArndW
The basic functionality is present in the product to do this using the generic stage, here a parameter value may be used to specify upon which column one is to re-partition on.

Nonetheless it would be a lot easier if one could specify the column as a parameter value within the Designer GUI.

Posted: Tue Sep 25, 2012 8:06 am
by elavenil
Thanks for your response.

Input file is read based on the schema file and RCP is enabled on the same since same template job is used. Hence Columns are not visible to select Key based hash partition (in Partitioning tab) and i do not see any placeholder to choose a parameter. Please help providing more details on this.

Can you provide more details how to use this option using Generic stage?

Regards
Elavenil

Posted: Tue Sep 25, 2012 8:16 am
by elavenil
Thanks for your response.

Template job is used to read input seq file and dataset is created on the same job. The job design is as below.

SeqFile --> Column Imp --> transformer --> Dataset. Schema file is used in Col Import stage and RCP is enabled. So while creating dataset, only 'Auto' partition is used as input columns are not seen in the partitioning tab. Hence my request is to create this dataset with Hash partition based on the key column. But key column is not seen, would want to use Parameter to pass Key column name during job's execution. Please provide more details if this can be achieved without Generic stage.

If Generic stage needs to be used, could you provide some sample script to be called in Generic stage.

Your response on this is greatly appreciated.

Regards
Elavenil

Posted: Tue Sep 25, 2012 8:36 am
by BI-RMA
The Generic stage can be configured with an operator and one or more options. The available operators and options are documented in the Parallel Job Advanced Developer's Guide.

Concerning operator "hash":
Use it with the option "key #key_columns#" (do not use my quotation marks in the job design!).

The parameter #key_columns# may consist of a list of more than one key columns in the form
"column1 -key column2 -key column3 [...]". You have to concatenate the column-names with the -key delimiters to build your parameter. Note that there is no "-" at the beginning of the option-string.

Posted: Tue Sep 25, 2012 9:20 pm
by elavenil
Thank you very much for your response.

Let me try this option and i will let you all know how it goes.

I am very sure this option will help us decreasing our batch timings at least 2 to 3 hours.

Regards
Elavenil

Posted: Mon Oct 29, 2012 12:23 am
by elavenil
I am able to partition the data using Generic stage. Partition type and key columns can be a paramters.

It worked fine and this would help reducing 2 hours in the batch time.

Thanks everyone for the help.

Regards
Elavenil

Posted: Mon Oct 29, 2012 7:10 am
by chulett
FYI - Given the nature of the conversation in this thread, decided to move it from the Enhancement Request forum to here. Enjoy! :wink: