Q. apt file Node setting

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
wuruima
Participant
Posts: 65
Joined: Mon Nov 04, 2013 10:15 pm

Q. apt file Node setting

Post by wuruima »

Dear all,
May I ask a quick question?
If I have 4 file system(every one is 10G), and I set the apt file like this

Code: Select all

        node "node1"
        {
                fastname "xxx"
                pools ""
                resource disk "/Node1/DataSets81" {pools ""}
                resource scratchdisk "/Node1/Scratch81" {pools ""}
        }
        node "node2"
        {
                fastname "xxx"
                pools ""
                resource disk "/Node2/DataSets81" {pools ""}
                resource scratchdisk "/Node2/Scratch81" {pools ""}
        }
        node "node3"
        {
                fastname "xxx"
                pools ""
                resource disk "/Node3/DataSets81" {pools ""}
                resource scratchdisk "/Node3/Scratch81" {pools ""}
        }
        node "node4"
        {
                fastname "xxx"
                pools ""
                resource disk "/Node4/DataSets81" {pools ""}
                resource scratchdisk "/Node4/Scratch81" {pools ""}
        }
I have a parallel job, which have an input sequential file (50G), will the job auto split the 50G file to 4 nodes for calculate? or meet the error that scratch space full?I don't have enough space in my server for test. would you please kindly give me the result ?Thanks.
wuruimao
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The job will automatically split the data over the four nodes for processing. If you use Round Robin partitioning, which is the default in most cases, then they will be distributed evenly.

Whether or not your scratch space is consumed depends on whether your job performs operations that require scratch space, such as sorting, lookups, etc. That is not something we can easily determine. However, you can, by using the Resource Estimation tool in DataStage Designer.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

but still you are 10 GB short :roll:
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

well if you are creative in your job, you could read the data in, fork off the column you need to sort into one data stream, then after that sort, join the data back into the rows.

That's the only way to not drop 50GB of data into your scratch area that is only 40GB total. BTW... that also limits you for other stages that use the scratch resource disk.
Post Reply