DataStage GRID

santoshkumar · Post by **santoshkumar** » Sun Dec 03, 2006 9:42 pm

Hi,

I am working on DataStage Enterprise Edition 7.5.1A on GRID.I have design issue w.r.t GRID.

I have 2 jobs first job extracts 2 million records and loads to a dataset.

Second job loads from dataset to oracle stage after doing change capture.

I have a sequence for this 2 jobs with Sequencer.sh excuting at the start of the sequence and passing on the configuration file name to job1 and job2.As part of my design i have a nested conditon which based on a input parameter decides to run job1 or job2.

My question is for the first time when i ran the sequence completed successfully but when i tried to run the next time from job2 then since the dataset in firstjob is created using a diff configuration file.Job is aborting.

How can i overcome this my objective is to be able to run the second job if my first job completes successfully and my second job gets aborted...

thebird · Post by **thebird** » Sun Dec 03, 2006 10:07 pm

Why dont you use the same configuration file that created the dataset , for the second job?

The issue might not exactly be becuase you are using a different config file. It might be because - for the second job you are tyring to use a config file that does not include the nodes that were used to create the dataset. In this case I would believe that the dataset would not be read as some of the data files would have been created in nodes that are not included in the second config file.

Aneesh

ray.wurlod · Post by **ray.wurlod** » Sun Dec 03, 2006 11:09 pm

In addition, recommended best practice is to use a dynamic configuration file with grid implementations, so that the grid management software can adapt if you lose a machine or two. There is, or is to be, an IBM white paper or RedBook on the subject. Ask your support vendor to inquire.

chulett · Post by **chulett** » Sun Dec 03, 2006 11:35 pm

There are also some recent posts on developerWorks regarding dynamic configurations in a GRID environment, from what I recall.

Just checked and they are here with a nice explanation from Patrick (Danny?) Owen at IBM, along with a mention of the "the internal software we have that is only availabe through a grid engagement with IBM Services."

santoshkumar · Post by **santoshkumar** » Mon Dec 04, 2006 7:17 am

thebird wrote:Why dont you use the same configuration file that created the dataset , for the second job?

The issue might not exactly be becuase you are using a different config file. It might be because - for the second job you are tyring to use a config file that does not include the nodes that were used to create the dataset. In this case I would believe that the dataset would not be read as some of the data files would have been created in nodes that are not included in the second config file.

Aneesh

Yeah when i run the second job alone also intially my Sequencer.sh script will be executed and outputs a config file which will be picked up Job2 but since the job1 is run on a different node it wont be able to find the dataset.

how can i overcome that.

thebird · Post by **thebird** » Mon Dec 04, 2006 8:58 am

santoshkumar wrote: how can i overcome that.

By passing the same configuration file to the parameter in the second job.

Since the dataset was created with a different configuration - if you have to read it - you would need a config file that will have the nodes which were used to create the dataset.

Aneesh

thebird · Post by **thebird** » Mon Dec 04, 2006 9:01 am

[quote="thebirdBy passing the same configuration file to the parameter in the second job.

[/quote]

If you dont want to run the Second job with the same config file, then use another configuration - but one which will include all nodes that were used in the creation of the dataset.

Aneesh