Hi Everyone,
In my current project we are moving 20GB of data from the source to the target. While processing this data sometimes we are blowing up the records to form new records. So, while processing the data that is moving through the system can go up to 60 GB. As the business requirement most of our processing is being done sequentially. Our current hardware configration is as below:
Stand alone Server without clustering
8 GB RAM
4 CPU (Multi threading disabled)
4 Hard drives (144GB)
The scratch space and the dataset space as given in the configuration file is only 20 GB. Frequently we get SIGKILL,SIGSEGV and SIGPIPE errors. We understand that there is a resource crunch.
It will be helpful if somebody can suggest a hardware configuration(DataStage PX) and also the necessary scratch and dataset space that will be ideal for our requirement.
Thanks in advance,
DataStage Machine Hardware configration Query
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
Hi Ray,
Thanks a lot for the quick reply. In a few jobs we are processing the data sequentially but in most of the jobs that we have we are using the parallelism only. But the problem is that when we try to run a no of jobs parallely(sometimes a single job also) in a sequence then we get a no of hardware realted issue like : 'Scratch Space Full', 'Heap Allocation Failed',SIGKILL,SIGSEGV,SIGINT. So we are considering an upgradation of the hardware resources. For the same we need to provide a configration with which we will not be getting these kinds of error.
Thats why we need your valuable inputs on this issue that what could be the best hardware configration for the kind of data processing we are doing.
Thanks in advanca,
Thanks a lot for the quick reply. In a few jobs we are processing the data sequentially but in most of the jobs that we have we are using the parallelism only. But the problem is that when we try to run a no of jobs parallely(sometimes a single job also) in a sequence then we get a no of hardware realted issue like : 'Scratch Space Full', 'Heap Allocation Failed',SIGKILL,SIGSEGV,SIGINT. So we are considering an upgradation of the hardware resources. For the same we need to provide a configration with which we will not be getting these kinds of error.
Thats why we need your valuable inputs on this issue that what could be the best hardware configration for the kind of data processing we are doing.
Thanks in advanca,
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The hardware you have will support multi-instance server jobs with less demand for resources than parallel jobs. There's no way I can suggest figures without monitoring and measuring what's happening on your system. Some things are obvious: scratch space full means that you need to configure more scratch space, but how much more must depend on exactly what is demanding scratch space. But 20GB does not seem like much. Try using an approach that "gives all partitions all the disk". For example, in a two node configuration:
Code: Select all
{
node "firstnode"
{
fastname "myserver"
pools "" "firstnodepool" "import"
resource disk "C:\Data\DataSets"
resource disk "D:\Data\DataSets"
resource scratchdisk "E:\Work\Scratch"
resource scratchdisk "F:\Work\Scratch"
}
node "secondnode"
fastname "myserver"
pools "" "secondnodepool" "export"
resource disk "D:\Data\DataSets"
resource disk "C:\Data\DataSets"
resource scratchdisk "F:\Work\Scratch"
resource scratchdisk "E:\Work\Scratch"
}
}
Last edited by ray.wurlod on Fri Jan 12, 2007 3:45 pm, edited 2 times in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.