Writing configuration file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
deepak.hsbc
Participant
Posts: 39
Joined: Sun Apr 15, 2007 11:30 pm

Writing configuration file

Post by deepak.hsbc »

Hello,
I have a datastage job which take 3 hours to process 30 million records and using a 4 node config file and this is a generic config file shared by all the jobs.
I want to make a new Config files specially for this job to optimise to process.

Wht information will be required to make an efficient config file -
I have made a list as per my understanding and is as below

1.Find the number of CPUs (phyical can Logical)
2.Find the memory installed on this server
3.if the CPU is Core Duo then define the Nodes in Config file as per that information.
....
...
....

Could someone please help me in writing the good config files for my job.
"Books are as useful to a stupid person as a mirror is useful to a Blind person."
John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

A different config file is only really useful is your jobs have been designed properly to make use of parallel processing. Best to start by looking at the job and see what you can do to improve things. Next you can just test your results with different config files ,say start with a 2 node config file and progressively increase that. See if your performance improves.
deepak.hsbc
Participant
Posts: 39
Joined: Sun Apr 15, 2007 11:30 pm

Post by deepak.hsbc »

Thanks John
And yes, job is designed to give its best performance and the only thing I want to test is by optimizing config file and i need some help in writing config file by best utilizing hardware available..
"Books are as useful to a stupid person as a mirror is useful to a Blind person."
kiran259
Participant
Posts: 48
Joined: Thu Aug 16, 2007 11:17 pm
Location: United States
Contact:

Post by kiran259 »

No.of CPU's used can be known by the Infrastructure admin.Generally no.of cpu's ~ no.of nodes.To improve performance,depending on resources available you can increase the node size in config file at the job level.Before doing this,check dump score of the job if any unnecessary sortings are present and see whether upstream or downstream is slow to change the settings.
Kiran Vaduguri

As soon as the fear approaches near, attack and destroy it.
Post Reply