Fileset

pavan_test · Post by **pavan_test** » Mon Feb 27, 2012 8:59 am

I have a datastage job writing a fileset as output. The job is running on 2x1 configuration file.Whenever the job run it fills up the space to 100% in dstage0 directory first and then the job aborts. I verified and there is no hard coding in the job to write into the dstage0 first.

Can someone please let me know why the job fills the dstage0 directory though therre is space availabel in dstage1 directory.

Thanks
Pavan

jwiles · Post by **jwiles** » Mon Feb 27, 2012 1:21 pm

What storage does your configuration file define as disk resources and for which nodes?

Regards,

pavan_test · Post by **pavan_test** » Tue Feb 28, 2012 9:18 am

here is the control file,

Code: Select all

{
        node "node1"
        {
                fastname "server1"
                pools ""
                resource disk "/opt/IBM/IIS/dstage0" {pools ""}
                resource scratchdisk "/opt/IBM/IIS/scratch0" {pools ""}
        }
        node "node2"
        {
                fastname "server1"
                pools ""
                resource disk "/opt/IBM/IIS/dstage1" {pools ""}
                resource scratchdisk "/opt/IBM/IIS/scratch1" {pools ""}
        }
}

Thanks
Pavan

jwiles · Post by **jwiles** » Tue Feb 28, 2012 3:03 pm

And now the next question or two:

What is your data source and how well is the data partitioned across your two nodes?

I'm going to hazard a guess and say that it sounds like your data is mostly in one of the two partitions instead of being evenly balanced across both. Because you only specify one path for disk resource in a node, data in that partition will be written to only that path when writing to a parallel dataset.

A few questions/suggestions:

1) Will your data FIT across both dstage0 and dstage1 and, if so, how much space will be left?
2) I would recommend you increase the size of both dstage0 and dstage1. It sounds like you are likely running with very little overhead for your storage. Will you be able to handle data volume growth for the next six months?
3) Specify dstage0 and dstage1 in ALL nodes in your config file. This will allow DataStage to distribute the data among the storage more evenly
4) Validate that you are partitioning your data correctly for your business rules (job design), both within this job and downstream jobs or targets. If feasible, you could RoundRobin partition the data before the output dataset if that will not affect downstream jobs/targets.

Regards,