DataStage GRID

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
santoshkumar
Charter Member
Charter Member
Posts: 35
Joined: Sun Jan 16, 2005 8:39 am
Location: US

DataStage GRID

Post by santoshkumar »

Hi,

I am working on DataStage Enterprise Edition 7.5.1A on GRID.I have design issue w.r.t GRID.

I have 2 jobs first job extracts 2 million records and loads to a dataset.

Second job loads from dataset to oracle stage after doing change capture.

I have a sequence for this 2 jobs with Sequencer.sh excuting at the start of the sequence and passing on the configuration file name to job1 and job2.As part of my design i have a nested conditon which based on a input parameter decides to run job1 or job2.

My question is for the first time when i ran the sequence completed successfully but when i tried to run the next time from job2 then since the dataset in firstjob is created using a diff configuration file.Job is aborting.

How can i overcome this my objective is to be able to run the second job if my first job completes successfully and my second job gets aborted...
Santosh
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

Why dont you use the same configuration file that created the dataset , for the second job?

The issue might not exactly be becuase you are using a different config file. It might be because - for the second job you are tyring to use a config file that does not include the nodes that were used to create the dataset. In this case I would believe that the dataset would not be read as some of the data files would have been created in nodes that are not included in the second config file.

Aneesh
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In addition, recommended best practice is to use a dynamic configuration file with grid implementations, so that the grid management software can adapt if you lose a machine or two. There is, or is to be, an IBM white paper or RedBook on the subject. Ask your support vendor to inquire.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There are also some recent posts on developerWorks regarding dynamic configurations in a GRID environment, from what I recall.

Just checked and they are here with a nice explanation from Patrick (Danny?) Owen at IBM, along with a mention of the "the internal software we have that is only availabe through a grid engagement with IBM Services."
-craig

"You can never have too many knives" -- Logan Nine Fingers
santoshkumar
Charter Member
Charter Member
Posts: 35
Joined: Sun Jan 16, 2005 8:39 am
Location: US

Post by santoshkumar »

thebird wrote:Why dont you use the same configuration file that created the dataset , for the second job?

The issue might not exactly be becuase you are using a different config file. It might be because - for the second job you are tyring to use a config file that does not include the nodes that were used to create the dataset. In this case I would believe that the dataset would not be read as some of the data files would have been created in nodes that are not included in the second config file.



Aneesh
Yeah when i run the second job alone also intially my Sequencer.sh script will be executed and outputs a config file which will be picked up Job2 but since the job1 is run on a different node it wont be able to find the dataset.

how can i overcome that.
Santosh
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

santoshkumar wrote: how can i overcome that.
By passing the same configuration file to the parameter in the second job.

Since the dataset was created with a different configuration - if you have to read it - you would need a config file that will have the nodes which were used to create the dataset.

Aneesh
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

[quote="thebirdBy passing the same configuration file to the parameter in the second job.

[/quote]

If you dont want to run the Second job with the same config file, then use another configuration - but one which will include all nodes that were used in the creation of the dataset.

Aneesh
Post Reply