&PH& Directory

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

&PH& Directory

Post by pavan_test »

Hi all,

Can anyone suggest me how can i find the number of process datastage job is creating?
Can the number of processes in the &PH& directory slow down the performance of a datastage job?

Thanks
Mark
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Files, not processes are created in that Directory.

What flavor of Unix are you using?

Yes, the more files that are present in that path the slower your job will be. This just like any other directory structure, more files means more wait to find the filename in the list.

You might also be suffereing from a fragmented file structure if you've been deleting files left and right in there.

More processes means more job startup time but also "may" improve your overall job speed.
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

&PH& Directory

Post by pavan_test »

Thanks Paul. More files are being created in that directory. This started recently and I am trying to understand what why it is happenning.

The OS is AIX 5.3 The start up time for some jobs is horrible. 1 hour 32 minutes, it used to be 1 or2 seconds in the past.
also the run for the jobs are now 7 hours which used to be around 50 minutes.

Thanks
Mark
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

Re: &PH& Directory

Post by pavan_test »

can you also please explain what do you mean by fragmented file structure.

How do I know if it is happenning in my environment?

Thanks
Mark
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I think you have a different problem. What makes you think that &PH& is the source of your delay?

Are you using RTLogging=1, ORLogging=0?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Also tell us how many files are in your &PH& directory.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

&PH& directory

Post by pavan_test »

I find these in the dsparams file

RTLogging=1
ORLogging=0

There are 65 files in the &PH& directory

Thanks
Mark
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK... 65 is nothing. &PH& is the phantom directory and 'phantom' means background process. Every job creates files there that it uses to communicate its status back to the engine, so having them there is perfectly normal. Now, if you had 65,000 files in there I'd be worried that writing to that directory may be impaired but that's clearly not the case.

RTLogging set to True means your logs are going to the 'repository', the legacy location, which should be fine. This rather than ORLogging, which would mean the XMETA repository which we've seen cause issues.

IMHO, you need to look elsewhere for your startup issues. Any chance the problematic jobs have a 'Before Job' task associated with them?
-craig

"You can never have too many knives" -- Logan Nine Fingers
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

&PH& directory

Post by pavan_test »

The jobs which used to run in 35-45 minutes are taking hours to complete. so i trying to know where the bottleneck could be.

when I run this ps -ef | grep osh | wc -l before the job starts it is around 300 and then it shoots all the way to 856 while the job is executing.

can someone suggest me where can I look that can give me clue as to why the jobs are running slowly?
prakashdasika
Premium Member
Premium Member
Posts: 72
Joined: Mon Jul 06, 2009 9:34 pm
Location: Sydney

Post by prakashdasika »

You can use performance analysis function in the job. It creates the reports with memory and cpu utilization for all the stages involved in the job. You can also include environment variables 'APT_PM_PLAYER_TIMING' and 'APT_PM_PLAYER_MEMORY' in your job and view the log to debug the operators/stages.
Prakash Dasika
ETL Consultant
Sydney
Australia
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Each job will generate N * (M + 1) + 1 processes, where N is the number of nodes and M is the number of operators (approximately the same as the number of stages).

You say when you start monitoring that there were already 300 osh processes, and your job caused this to jump to 856. So clearly your job is creating substantial demand for resources, not least of which is starting up 556 processes!

Are all of the 300 osh processes genuinely active processes, or do you have defunct processes hanging around?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

While the job is running, open director and "monitor" the job. That will tell you what stages are currently processing data.

Did volume of data change?

I do not know why volume of data would spawn more osh executables. If that is the case, I believe your job submission strategy is reading in some text file / DB extract and spawning a multi instance job per criteria X.

Also, are you the only project executing on that DataStage server?

I would look at "ps -ef | grep DSD.RUN". To see how many sequencers and jobs are running on the server. Are they all yours?
Post Reply