Datastage-Unix I/O transfer problem ?

suneyes · Post by **suneyes** » Tue Mar 31, 2009 5:05 am

Hi,
We have a Data stage project, running 24X7, whose performance has been steadily degrading. The mean time taken for processing a record has been increasing over time. On analysis, we found that the actual running time of the job has not changed , rather the time between the Unix script calling the job and the time when the job actually starts running has increased.

We have tried restarting the DS server and have also tried rebuilding repository indices. But the performance drop is still present. Can any one provide a solution as to how this issue can be tackled ?

Thanks in advance

ray.wurlod · Post by **ray.wurlod** » Tue Mar 31, 2009 6:40 am

My first guess is that the &PH& directory in the process is becoming more and more full. Clear it out severely - perhaps everything more than two days old.

chulett · Post by **chulett** » Tue Mar 31, 2009 6:50 am

That would be my first guess as well. There's one in each project directory and since "&" is a metacharacter you'll need to do something like this to get into it:

Code: Select all

cd \&PH\&

Pretty common to have a cron script that keeps this pruned, deleting anything there over X number of days old on a daily basis.

suneyes · Post by **suneyes** » Tue Mar 31, 2009 10:25 pm

Hi Ray and chulett,
Thank you for your help. I have tried deleting the older files in the &PH& folder, but am still not able to observe any positive outcome

ray.wurlod · Post by **ray.wurlod** » Tue Mar 31, 2009 11:12 pm

Has the total load on the system been increasing commensurately over time with the degradation you have been observing?

suneyes · Post by **suneyes** » Tue Mar 31, 2009 11:17 pm

No, the load is constant. Its just the processing time which has been increasing over time.

chulett · Post by **chulett** » Wed Apr 01, 2009 8:17 am

Hopefully you saw the words total load and recognize it means everything running there, not just DataStage, yes? And is this an issue with all jobs, or just one particular job? Unclear from your post.

kcbland · Post by **kcbland** » Wed Apr 01, 2009 9:20 am

The poster has stated there's a lag between the script and the startup of the job, and then switches and states processing time is increased. Is this two issues being stated or one?

If the job has a delay between the script executing the dsjob command and the job actually starting - this is probably the well-known &PH& in each project being full issue.

If the job has a slow startup time once running - check to see if your job logs are full in all your jobs. This could be just significant overhead as each job logs messages into their bloated log files.

If you're talking about rows/second degradation over time for all jobs - that's a totally different issue and we get into all kinds of discussion about disk i/o, cpu utilization, your monitoring with prstat, etc.

chulett · Post by **chulett** » Wed Apr 01, 2009 10:15 am

I'm also wondering if this is a "before job" issue, something being run there that is taking "more and more time" to complete. This can appear to people as the job taking a long time to start if they're not reading the logs correctly or are unaware of its presence.

Of course, that's all predicated on the assumption this is one particular job that is exhibiting the problem. If it's all of them... and their &PH& directory is indeed clean...

kcbland · Post by **kcbland** » Wed Apr 01, 2009 10:44 am

And that each project has an &PH& in it....

devanars · Post by **devanars** » Wed Apr 01, 2009 1:07 pm

which version of the datastege are you using?

Let me know if it is 8.1. Thanks

chulett · Post by **chulett** » Wed Apr 01, 2009 3:43 pm

Hopefully it's not, seeing as how they marked their Release as 7x in the opening post. I know where you're going, though...

suneyes · Post by **suneyes** » Thu Apr 02, 2009 2:24 am

Hi,
It seems, I have caused some confusion. Let me put forward my problem in detail:
We have a Datastage project having about 22 jobs.
We have created multiple copies of the above project and incoming files(Each file contains a maximum of 5 records) are routed into one these projects. These different instances of the base project are run 24X7.
The issue we are facing is that, while the ideal turn around time for an input file sent to any of the instance is about 9-10 mins, some of the instances are taking much longer (about as much as 45-50 mins ). On closer inspection we found that the actual processing time of a Datastage job hasn't changed ( it usually is about 30-35 seconds per job) rather the the time difference between a job being called by a Unix script to the time when the DS job is actually invoked has been increasing.

One other observation we have made is that, this increase in turnaround time is only being observed in those instances where the frequency of incoming files is high. For instances having to handle fewer number of files, the turn around times are comparable to the ideal turn around time.
Also, we are using DS7.5

I have tried clearing out the &PH& folder, but to of no avail.
Can any one please advise as to what might have been causing this and how to optimize the system?

Thanks in advance

chulett · Post by **chulett** » Thu Apr 02, 2009 7:31 am

Still confused. Projects cannot "run 24x7" so what exactly does that mean? "RTI Enabled" jobs can be set to be "always on" and thus run constantly as a web service, is that what you are doing? It doesn't sound like it, since there seems to be a script involved.

What happens inside your script? When you call it, what all does it need to do before it actually issues the dsjob command? Are you 'preprocessing' the files somehow, something that could be affected by the volume of files? How exactly does a job process multiple files.. looping Sequence job, a cat before job, ????

You've narrowed down where / when the problem occurs but given us no information specific to that time / process so we're still guessing.

suneyes · Post by **suneyes** » Fri Apr 03, 2009 12:44 am

Hi Chulett,
We have a script (being kicked off every 5 mins from CRON) which keeps monitoring a set of ftp directories for files(with each file having one record in it). If files are found in any of the monitored directories, the files from are picked up and all the corresponding records are clubbed together to form a single new file. Then the DS Project is kicked off with this new file, where the file is processed by running through the 22 jobs in the instance. The usual number of records in this file does not exceed 5 records.

I guess my problem is clear by now..