I have a datastage job writing a fileset as output. The job is running on 2x1 configuration file.Whenever the job run it fills up the space to 100% in dstage0 directory first and then the job aborts. I verified and there is no hard coding in the job to write into the dstage0 first.
Can someone please let me know why the job fills the dstage0 directory though therre is space availabel in dstage1 directory.
Thanks
Pavan
Fileset
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 263
- Joined: Fri Sep 23, 2005 6:49 am
fileset
here is the control file,
Thanks
Pavan
Code: Select all
{
node "node1"
{
fastname "server1"
pools ""
resource disk "/opt/IBM/IIS/dstage0" {pools ""}
resource scratchdisk "/opt/IBM/IIS/scratch0" {pools ""}
}
node "node2"
{
fastname "server1"
pools ""
resource disk "/opt/IBM/IIS/dstage1" {pools ""}
resource scratchdisk "/opt/IBM/IIS/scratch1" {pools ""}
}
}
Pavan
And now the next question or two:
What is your data source and how well is the data partitioned across your two nodes?
I'm going to hazard a guess and say that it sounds like your data is mostly in one of the two partitions instead of being evenly balanced across both. Because you only specify one path for disk resource in a node, data in that partition will be written to only that path when writing to a parallel dataset.
A few questions/suggestions:
1) Will your data FIT across both dstage0 and dstage1 and, if so, how much space will be left?
2) I would recommend you increase the size of both dstage0 and dstage1. It sounds like you are likely running with very little overhead for your storage. Will you be able to handle data volume growth for the next six months?
3) Specify dstage0 and dstage1 in ALL nodes in your config file. This will allow DataStage to distribute the data among the storage more evenly
4) Validate that you are partitioning your data correctly for your business rules (job design), both within this job and downstream jobs or targets. If feasible, you could RoundRobin partition the data before the output dataset if that will not affect downstream jobs/targets.
Regards,
What is your data source and how well is the data partitioned across your two nodes?
I'm going to hazard a guess and say that it sounds like your data is mostly in one of the two partitions instead of being evenly balanced across both. Because you only specify one path for disk resource in a node, data in that partition will be written to only that path when writing to a parallel dataset.
A few questions/suggestions:
1) Will your data FIT across both dstage0 and dstage1 and, if so, how much space will be left?
2) I would recommend you increase the size of both dstage0 and dstage1. It sounds like you are likely running with very little overhead for your storage. Will you be able to handle data volume growth for the next six months?
3) Specify dstage0 and dstage1 in ALL nodes in your config file. This will allow DataStage to distribute the data among the storage more evenly
4) Validate that you are partitioning your data correctly for your business rules (job design), both within this job and downstream jobs or targets. If feasible, you could RoundRobin partition the data before the output dataset if that will not affect downstream jobs/targets.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.