dynamic configuration settings

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kavuri
Premium Member
Premium Member
Posts: 161
Joined: Mon Apr 16, 2007 2:56 pm

dynamic configuration settings

Post by kavuri »

Hi,
I would like to prepare multiple configuration files. This I can do. But here my doubt is how to set this configuration dynamically? i.e while running job.

Thanks
Kavuri
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Re: dynamic configuration settings

Post by kris »

I think you are asking if you can come up with a new configuration file at run time.

No. You have to have your configuration file in place before you would use for any job.

You can generate in manager using GUI.
Or else you can write on your own and copy.
~Kris
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

In a GRID environment, your configuration file is generated dynamically during run time.
kavuri
Premium Member
Premium Member
Posts: 161
Joined: Mon Apr 16, 2007 2:56 pm

Post by kavuri »

Hi,
Kris, Suppose if I create a configuration file let say "config1.apt" in manager then How can I give it as a job parameter?

lstsaur, Can you tell me in some more detail what is Grid environment? Is it readily available in DataStage or do w need to prgram it seperately. If you can provide some links relating to this it is most appreciated.

Thanks
Kavuri
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You can have multiple configuration files. Set one particular config file as a default which will be used by your jobs, in the administrator for the env. variable "APT_CONFIGFILE". You can overrite the default value by explcitly having this env. variable defined in your job parameters and providing it a value during runtime.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A parallel job usually has one configuration file. You parameterize this by including $APT_CONFIG_FILE as a job parameter. However, there is no easy mechanism for changing configuration files once the job is running.

IBM reserve unto themselves the ability to install dynamic configuration files such as those needed to run parallel jobs in a grid environment.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Re: dynamic configuration settings

Post by JoshGeorge »

You "can come up with a new configuration file at run time". A shell script which creates configuration file can be called from your main sequence and this can be passed to all the 'Parallel jobs' being called in that sequence. This way you will have a 'configuration file in place before you would use for any job'.

kris wrote:I think you are asking if you can come up with a new configuration file at run time.

No. You have to have your configuration file in place before you would use for any job.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Kavuri,
As I said in earlier note, in a grid-enabled environment, the configurations file is generated dynamically and you have only one default.apt (2 nodes) configuration file in the whole environment. When the job finished, the dynamically generated configuration file is gone too.

For example, I have 70 compute nodes in my grid environment, but my job is asking for 8 nodes. All I have to do is to populate the APT_GRID_COMPUTENODE parameter with a value of 8. My grid_enabled job will talk to the Resouce Mananger (I am using PBSPro) which will find the available resources and generate a configuaration file with 8 compute nodes "dynamically" for me. When the job finished, the configuration file is gone. The beauty of the grid computing is that you have no idea about the job was run on which 8 nodes, but it's always delivered. No need to manually prepare multiple configuration files.

To me, that's "dynamically" generating configuration file. Yes, Ascential Grid Computing is available.
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Re: dynamic configuration settings

Post by kris »

JoshGeorge wrote:You "can come up with a new configuration file at run time". A shell script which creates configuration file can be called from your main sequence and this can be passed to all the 'Parallel jobs' being called in that sequence. This way you will have a 'configuration file in place before you would use for any job'.

I thought we were not talking about building one at run time which would have no relevance on where and how you are using it.

Joshy,

What intelligence this script would (or possibly could) have which builds the configuration file depending on the kind of processing one would attempt to do?

If you are calling this script for generating one and using for all parallel jobs, then what difference does it take?

If there is no way of building one based on the task and its resource needs then what would be the difference in keeping your configuration files in place than writing them dynamically at run time without them having any difference based on your needs.

Unlike in GRID, if there is a way, it has to be intelligent enough to assess your processing needs depending on where you are using then come up with an appropriate configuration file.

I think there is no easy way of achieving this with.
~Kris
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Re: dynamic configuration settings

Post by kris »

No offence Joshy, I am here to learn.

Until "lstsaur" brought about the Grid topic into the discussion, I was under the impression that the discussion was only about normal SMP or PMP environments.

I understand a bit of what "lstsaur" was saying about the GRID environment now.

lstsaur,

Could you please give some more details on
For example, I have 70 compute nodes in my grid environment, but my job is asking for 8 nodes. All I have to do is to populate the APT_GRID_COMPUTENODE parameter with a value of 8
how can one determine that a job needs 8 nodes? less or more? Is there a criterion to choose the number of nodes?

Appreciate your time.
~Kris
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Kris,
In a grid environment, the value, e.g. 8, in your grid_enabled job's APT_GRID_COMPUTENODE parameter tells the Resource Manager (I am using PBSPro) that this job needs 8 compute nodes. Then the Resource Manger will find the available resources and dynamically generate a configuration file for the job among those 70 compute nodes. When the job is finished, the configuration file is gone.

So, later if you want to run the same job with 16 nodes, all you have to do is change the value of APT_GRID_COMPUTENODE to 16.

Hope this will clarify for you.
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Post by kris »

Thanks a lot for clarifying "lstsaur".

I have heard about Grid but never worked on it. It seems like this functionality makes whole configuration stuff very easy and manageable on Grid.

Not sure if similar functionality can be achievable on SMP or PMP systems.
~Kris
adesanyaa
Participant
Posts: 4
Joined: Fri Aug 31, 2007 10:43 am

Post by adesanyaa »

lstsaur wrote:Kavuri,
As I said in earlier note, in a grid-enabled environment, the configurations file is generated dynamically and you have only one default.apt (2 nodes) configuration file in the whole environment. When the job finished, the dynamically generated configuration file is gone too.

For example, I have 70 compute nodes in my grid environment, but my job is asking for 8 nodes. All I have to do is to populate the APT_GRID_COMPUTENODE parameter with a value of 8. My grid_enabled job will talk to the Resouce Mananger (I am using PBSPro) which will find the available resources and generate a configuaration file with 8 compute nodes "dynamically" for me. When the job finished, the configuration file is gone. The beauty of the grid computing is that you have no idea about the job was run on which 8 nodes, but it's always delivered. No need to manually prepare multiple configuration files.

To me, that's "dynamically" generating configuration file. Yes, Ascential Grid Computing is available.
Do you have to install any special software to support the DataStage in Grid environment

Thanks
Tola
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

Can any one please tell me how to have grid environment in datastage?

Please help me here.
RD
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You spend money with IBM Services. They will implement the dynamic configuration file and other pieces that you need for grid execution.

Thus far they have not released that particular toolkit to contractors nor, as far as I am aware, to resellers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply