Page 1 of 1

DataStage GRID

Posted: Sun Dec 03, 2006 9:42 pm
by santoshkumar
Hi,

I am working on DataStage Enterprise Edition 7.5.1A on GRID.I have design issue w.r.t GRID.

I have 2 jobs first job extracts 2 million records and loads to a dataset.

Second job loads from dataset to oracle stage after doing change capture.

I have a sequence for this 2 jobs with Sequencer.sh excuting at the start of the sequence and passing on the configuration file name to job1 and job2.As part of my design i have a nested conditon which based on a input parameter decides to run job1 or job2.

My question is for the first time when i ran the sequence completed successfully but when i tried to run the next time from job2 then since the dataset in firstjob is created using a diff configuration file.Job is aborting.

How can i overcome this my objective is to be able to run the second job if my first job completes successfully and my second job gets aborted...

Posted: Sun Dec 03, 2006 10:07 pm
by thebird
Why dont you use the same configuration file that created the dataset , for the second job?

The issue might not exactly be becuase you are using a different config file. It might be because - for the second job you are tyring to use a config file that does not include the nodes that were used to create the dataset. In this case I would believe that the dataset would not be read as some of the data files would have been created in nodes that are not included in the second config file.

Aneesh

Posted: Sun Dec 03, 2006 11:09 pm
by ray.wurlod
In addition, recommended best practice is to use a dynamic configuration file with grid implementations, so that the grid management software can adapt if you lose a machine or two. There is, or is to be, an IBM white paper or RedBook on the subject. Ask your support vendor to inquire.

Posted: Sun Dec 03, 2006 11:35 pm
by chulett
There are also some recent posts on developerWorks regarding dynamic configurations in a GRID environment, from what I recall.

Just checked and they are here with a nice explanation from Patrick (Danny?) Owen at IBM, along with a mention of the "the internal software we have that is only availabe through a grid engagement with IBM Services."

Posted: Mon Dec 04, 2006 7:17 am
by santoshkumar
thebird wrote:Why dont you use the same configuration file that created the dataset , for the second job?

The issue might not exactly be becuase you are using a different config file. It might be because - for the second job you are tyring to use a config file that does not include the nodes that were used to create the dataset. In this case I would believe that the dataset would not be read as some of the data files would have been created in nodes that are not included in the second config file.



Aneesh
Yeah when i run the second job alone also intially my Sequencer.sh script will be executed and outputs a config file which will be picked up Job2 but since the job1 is run on a different node it wont be able to find the dataset.

how can i overcome that.

Posted: Mon Dec 04, 2006 8:58 am
by thebird
santoshkumar wrote: how can i overcome that.
By passing the same configuration file to the parameter in the second job.

Since the dataset was created with a different configuration - if you have to read it - you would need a config file that will have the nodes which were used to create the dataset.

Aneesh

Posted: Mon Dec 04, 2006 9:01 am
by thebird
[quote="thebirdBy passing the same configuration file to the parameter in the second job.

[/quote]

If you dont want to run the Second job with the same config file, then use another configuration - but one which will include all nodes that were used in the creation of the dataset.

Aneesh