Parameter values based Partition

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Parameter values based Partition

Post by elavenil »

Hi,

We are using a template job to read input file and create an output datasets. Since template job is used, we will not be able to define partition based on key as key columns will be different for each input file and RCP is enabled in the job. This data needs to be repartitioned in the next job based on the downstream process, which uses the dataset created in the template job.

If Hash partition is created based on the parameter value, the column name can be passed as a parameter so that the same template job can be used to create this dataset with the hash partition with appropriate key(s).

This will significantly improve the development effort needed to handle such input files.

It is worth to consider this idea to enhance this product.

Regards
Elavenil
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The basic functionality is present in the product to do this using the generic stage, here a parameter value may be used to specify upon which column one is to re-partition on.

Nonetheless it would be a lot easier if one could specify the column as a parameter value within the Designer GUI.
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Thanks for your response.

Input file is read based on the schema file and RCP is enabled on the same since same template job is used. Hence Columns are not visible to select Key based hash partition (in Partitioning tab) and i do not see any placeholder to choose a parameter. Please help providing more details on this.

Can you provide more details how to use this option using Generic stage?

Regards
Elavenil
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Thanks for your response.

Template job is used to read input seq file and dataset is created on the same job. The job design is as below.

SeqFile --> Column Imp --> transformer --> Dataset. Schema file is used in Col Import stage and RCP is enabled. So while creating dataset, only 'Auto' partition is used as input columns are not seen in the partitioning tab. Hence my request is to create this dataset with Hash partition based on the key column. But key column is not seen, would want to use Parameter to pass Key column name during job's execution. Please provide more details if this can be achieved without Generic stage.

If Generic stage needs to be used, could you provide some sample script to be called in Generic stage.

Your response on this is greatly appreciated.

Regards
Elavenil
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

The Generic stage can be configured with an operator and one or more options. The available operators and options are documented in the Parallel Job Advanced Developer's Guide.

Concerning operator "hash":
Use it with the option "key #key_columns#" (do not use my quotation marks in the job design!).

The parameter #key_columns# may consist of a list of more than one key columns in the form
"column1 -key column2 -key column3 [...]". You have to concatenate the column-names with the -key delimiters to build your parameter. Note that there is no "-" at the beginning of the option-string.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Thank you very much for your response.

Let me try this option and i will let you all know how it goes.

I am very sure this option will help us decreasing our batch timings at least 2 to 3 hours.

Regards
Elavenil
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

I am able to partition the data using Generic stage. Partition type and key columns can be a paramters.

It worked fine and this would help reducing 2 hours in the batch time.

Thanks everyone for the help.

Regards
Elavenil
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

FYI - Given the nature of the conversation in this thread, decided to move it from the Enhancement Request forum to here. Enjoy! :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply