DataStage Machine Hardware configration Query

ashik_punar · Post by **ashik_punar** » Fri Jan 12, 2007 1:40 am

Hi Everyone,

In my current project we are moving 20GB of data from the source to the target. While processing this data sometimes we are blowing up the records to form new records. So, while processing the data that is moving through the system can go up to 60 GB. As the business requirement most of our processing is being done sequentially. Our current hardware configration is as below:

Stand alone Server without clustering
8 GB RAM
4 CPU (Multi threading disabled)
4 Hard drives (144GB)

The scratch space and the dataset space as given in the configuration file is only 20 GB. Frequently we get SIGKILL,SIGSEGV and SIGPIPE errors. We understand that there is a resource crunch.

It will be helpful if somebody can suggest a hardware configuration(DataStage PX) and also the necessary scratch and dataset space that will be ideal for our requirement.

Thanks in advance,

ray.wurlod · Post by **ray.wurlod** » Fri Jan 12, 2007 2:32 am

If the business requires you to process sequentially, consider using lower impact server jobs.

ashik_punar · Post by **ashik_punar** » Fri Jan 12, 2007 2:40 am

Hi Ray,

Thanks a lot for the quick reply. In a few jobs we are processing the data sequentially but in most of the jobs that we have we are using the parallelism only. But the problem is that when we try to run a no of jobs parallely(sometimes a single job also) in a sequence then we get a no of hardware realted issue like : 'Scratch Space Full', 'Heap Allocation Failed',SIGKILL,SIGSEGV,SIGINT. So we are considering an upgradation of the hardware resources. For the same we need to provide a configration with which we will not be getting these kinds of error.

Thats why we need your valuable inputs on this issue that what could be the best hardware configration for the kind of data processing we are doing.

Thanks in advanca,

ray.wurlod · Post by **ray.wurlod** » Fri Jan 12, 2007 4:57 am

The hardware you have will support multi-instance server jobs with less demand for resources than parallel jobs. There's no way I can suggest figures without monitoring and measuring what's happening on your system. Some things are obvious: scratch space full means that you need to configure more scratch space, but how much more must depend on exactly what is demanding scratch space. But 20GB does not seem like much. Try using an approach that "gives all partitions all the disk". For example, in a two node configuration:

Code: Select all

{
   node "firstnode"
   {
      fastname "myserver"
         pools "" "firstnodepool" "import"
      resource disk "C:\Data\DataSets"
      resource disk "D:\Data\DataSets"
      resource scratchdisk "E:\Work\Scratch"
      resource scratchdisk "F:\Work\Scratch"
   }
   node "secondnode"
      fastname "myserver"
         pools "" "secondnodepool" "export"
      resource disk "D:\Data\DataSets"
      resource disk "C:\Data\DataSets"
      resource scratchdisk "F:\Work\Scratch"
      resource scratchdisk "E:\Work\Scratch"
   }
}

DSguru2B · Post by **DSguru2B** » Fri Jan 12, 2007 8:16 am

Other errors like 'Heap Allocation Failed',SIGKILL,SIGSEGV,SIGINT are OS level errors. You can do thorough research on google and ways to avoid and/or get rid of them.