Fileset

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

Fileset

Post by pavan_test »

I have a datastage job writing a fileset as output. The job is running on 2x1 configuration file.Whenever the job run it fills up the space to 100% in dstage0 directory first and then the job aborts. I verified and there is no hard coding in the job to write into the dstage0 first.

Can someone please let me know why the job fills the dstage0 directory though therre is space availabel in dstage1 directory.

Thanks
Pavan
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

What storage does your configuration file define as disk resources and for which nodes?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

fileset

Post by pavan_test »

here is the control file,

Code: Select all

{
        node "node1"
        {
                fastname "server1"
                pools ""
                resource disk "/opt/IBM/IIS/dstage0" {pools ""}
                resource scratchdisk "/opt/IBM/IIS/scratch0" {pools ""}
        }
        node "node2"
        {
                fastname "server1"
                pools ""
                resource disk "/opt/IBM/IIS/dstage1" {pools ""}
                resource scratchdisk "/opt/IBM/IIS/scratch1" {pools ""}
        }
}
Thanks
Pavan
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

And now the next question or two:

What is your data source and how well is the data partitioned across your two nodes?

I'm going to hazard a guess and say that it sounds like your data is mostly in one of the two partitions instead of being evenly balanced across both. Because you only specify one path for disk resource in a node, data in that partition will be written to only that path when writing to a parallel dataset.

A few questions/suggestions:

1) Will your data FIT across both dstage0 and dstage1 and, if so, how much space will be left?
2) I would recommend you increase the size of both dstage0 and dstage1. It sounds like you are likely running with very little overhead for your storage. Will you be able to handle data volume growth for the next six months?
3) Specify dstage0 and dstage1 in ALL nodes in your config file. This will allow DataStage to distribute the data among the storage more evenly
4) Validate that you are partitioning your data correctly for your business rules (job design), both within this job and downstream jobs or targets. If feasible, you could RoundRobin partition the data before the output dataset if that will not affect downstream jobs/targets.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply