Config File creation

deepak.hsbc · Post by **deepak.hsbc** » Tue Nov 18, 2008 1:49 pm

Hello all
I have to write configuration file for my job which process large amout of data.My job contains 3 joins and 6 sort stages which takes most of the time.
Input of this file is a sequential file and one row is having 176 columns in combine forms 1245 byte long string.And there are average 40 million rows..

I pulled following information abt resources using TOPAS
Online Memory: 16384.0 MB
Online Logical CPUs: 8
Online Virtual CPUs: 4

Any suggestion on how should i start writing configuration file using above information !!!!

srimitta · Post by **srimitta** » Tue Nov 18, 2008 2:00 pm

/Ascential/DataStage/Configurations/default.apt is the location of configuration file.

to add more power to data processing, you need number of nodes and also below information

fastname <Host Name>
resource disk <Datasets path>
resource scratchdisk <Scratch file path>

deepak.hsbc · Post by **deepak.hsbc** » Tue Nov 18, 2008 2:14 pm

Thanks for the reply but how many nodes should i use ?

srimitta · Post by **srimitta** » Tue Nov 18, 2008 2:33 pm

Ask your DataStage admin, he / she should be able to provide information.

deepak.hsbc · Post by **deepak.hsbc** » Tue Nov 18, 2008 2:44 pm

I am the new Admin and i m gonna decide these things...

thats want i want to learn how to start doing this !!

ray.wurlod · Post by **ray.wurlod** » Tue Nov 18, 2008 3:18 pm

Have you taken the IBM class DX437 (Administering DataStage) or some equivalent? This provides you with the skills you need.

deepak.hsbc · Post by **deepak.hsbc** » Tue Nov 18, 2008 3:45 pm

yeah i have taken the Dx437 classes but still I am not fully confident of doing that..making an normal config files seems easy and i did that but the job in cencern definetly need a highly tuned config file..

Below is the config file i made and using but it gives a run of 3 hour to that job.do u think this is normal ??
==============================================
{
node "node01"
{
fastname "EtlServer01"
pools ""
resource disk "/IBM/MyProject/node01/resource" {pools "" }
resource scratchdisk "/IBM/MyProject/node01/scratch" {pools "" }
resource scratchdisk "/IBM/MyProject/node01/buffer" {pools "buffer"}
}
node "node02"
{
fastname "EtlServer01"
pools ""
resource disk "/IBM/MyProject/node02/resource" {pools "" }
resource scratchdisk "/IBM/MyProject/node02/scratch" {pools "" }
resource scratchdisk "/IBM/MyProject/node02/buffer" {pools "buffer"}
}
node "node03"
{
fastname "EtlServer01"
pools ""
resource disk "/IBM/MyProject/node03/resource" {pools "" }
resource scratchdisk "/IBM/MyProject/node03/scratch" {pools "" }
resource scratchdisk "/IBM/MyProject/node03/buffer" {pools "buffer"}
}
node "node04"
{
fastname "EtlServer01"
pools ""
resource disk "/IBM/MyProject/node04/resource" {pools "" }
resource scratchdisk "/IBM/MyProject/node04/scratch" {pools "" }
resource scratchdisk "/IBM/MyProject/node04/buffer" {pools "buffer"}
}
node "node05"
{
fastname "EtlServer01"
pools ""
resource disk "/IBM/MyProject/node05/resource" {pools "" }
resource scratchdisk "/IBM/MyProject/node05/scratch" {pools "" }
resource scratchdisk "/IBM/MyProject/node05/buffer" {pools "buffer"}
}
node "DB2"
{ fastname "Db2Server01"
pools "db2"
resource disk "/IBM/MyProject/node01/scratch" {pools ""}
resource scratchdisk "/IBM/MyProject/node01/buffer" {pools ""}
}
}

===============================================

ray.wurlod · Post by **ray.wurlod** » Tue Nov 18, 2008 8:39 pm

Why only one node in the DB2 node pool? Surely this will be a bottleneck. Create extra nodes in this node pool on the Db2Server01 machine. Unless you do this, all DB2 operations will be sequential.

deepak.hsbc · Post by **deepak.hsbc** » Tue Nov 18, 2008 9:56 pm

thanks.....This is what i was looking for,now i can start working from this point...