Hi,
We are using a template job to read input file and create an output datasets. Since template job is used, we will not be able to define partition based on key as key columns will be different for each input file and RCP is enabled in the job. This data needs to be repartitioned in the next job based on the downstream process, which uses the dataset created in the template job.
If Hash partition is created based on the parameter value, the column name can be passed as a parameter so that the same template job can be used to create this dataset with the hash partition with appropriate key(s).
This will significantly improve the development effort needed to handle such input files.
It is worth to consider this idea to enhance this product.
Regards
Elavenil
Parameter values based Partition
Moderators: chulett, rschirm, roy
The basic functionality is present in the product to do this using the generic stage, here a parameter value may be used to specify upon which column one is to re-partition on.
Nonetheless it would be a lot easier if one could specify the column as a parameter value within the Designer GUI.
Nonetheless it would be a lot easier if one could specify the column as a parameter value within the Designer GUI.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Thanks for your response.
Input file is read based on the schema file and RCP is enabled on the same since same template job is used. Hence Columns are not visible to select Key based hash partition (in Partitioning tab) and i do not see any placeholder to choose a parameter. Please help providing more details on this.
Can you provide more details how to use this option using Generic stage?
Regards
Elavenil
Input file is read based on the schema file and RCP is enabled on the same since same template job is used. Hence Columns are not visible to select Key based hash partition (in Partitioning tab) and i do not see any placeholder to choose a parameter. Please help providing more details on this.
Can you provide more details how to use this option using Generic stage?
Regards
Elavenil
Thanks for your response.
Template job is used to read input seq file and dataset is created on the same job. The job design is as below.
SeqFile --> Column Imp --> transformer --> Dataset. Schema file is used in Col Import stage and RCP is enabled. So while creating dataset, only 'Auto' partition is used as input columns are not seen in the partitioning tab. Hence my request is to create this dataset with Hash partition based on the key column. But key column is not seen, would want to use Parameter to pass Key column name during job's execution. Please provide more details if this can be achieved without Generic stage.
If Generic stage needs to be used, could you provide some sample script to be called in Generic stage.
Your response on this is greatly appreciated.
Regards
Elavenil
Template job is used to read input seq file and dataset is created on the same job. The job design is as below.
SeqFile --> Column Imp --> transformer --> Dataset. Schema file is used in Col Import stage and RCP is enabled. So while creating dataset, only 'Auto' partition is used as input columns are not seen in the partitioning tab. Hence my request is to create this dataset with Hash partition based on the key column. But key column is not seen, would want to use Parameter to pass Key column name during job's execution. Please provide more details if this can be achieved without Generic stage.
If Generic stage needs to be used, could you provide some sample script to be called in Generic stage.
Your response on this is greatly appreciated.
Regards
Elavenil
The Generic stage can be configured with an operator and one or more options. The available operators and options are documented in the Parallel Job Advanced Developer's Guide.
Concerning operator "hash":
Use it with the option "key #key_columns#" (do not use my quotation marks in the job design!).
The parameter #key_columns# may consist of a list of more than one key columns in the form
"column1 -key column2 -key column3 [...]". You have to concatenate the column-names with the -key delimiters to build your parameter. Note that there is no "-" at the beginning of the option-string.
Concerning operator "hash":
Use it with the option "key #key_columns#" (do not use my quotation marks in the job design!).
The parameter #key_columns# may consist of a list of more than one key columns in the form
"column1 -key column2 -key column3 [...]". You have to concatenate the column-names with the -key delimiters to build your parameter. Note that there is no "-" at the beginning of the option-string.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
There are the grateful those are happy." Francis Bacon