Dataset Read is slow

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

What OS are you using?
try creating a new subdirectory under your regular resource disk path and defining it in your APT file as your resource disk location. copy the dataset into that path (orchadmin cp command).
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

When you followed Ray's suggestion how long did the job take to read the 1GB dataset?

A few things to consider in your investigation in no particular order:
Do any other jobs that read or write large datasets suffer from poor performance?
Is there any partitioning or sorting going on?
Are you able to monitor the box and then run this job (preferably with nothing else running at the same time)? Doing so should allow you or the system admin to determine if the job is i/o bound.

1GB in around 8 minutes is very slow at about 2MB a second.
Can you exclude DataStage and test performance copying a large file around the system?
Are all 4 nodes writing to the same disk?
Are these disk(s) local or is DataStage going across a network - is there a network problem (dropped packets for example - seen that before with a faulty switch)?
Are the disks and controller looking healthy.
rohitagarwal15
Participant
Posts: 102
Joined: Thu Sep 17, 2009 1:23 am

Post by rohitagarwal15 »

Is this Dataset problem is with this particular Dataset or is this is for all other Datasets too ?
If this is the problem for all the datasets then probably you can check with your storage or unix team about the mount points on which datasets description file is getting created.
Rohit
Post Reply