Page 1 of 2

DS Director and Designer is not responding

Posted: Thu Aug 22, 2013 5:50 pm
by suneelchallagali
Currently we are having issue, in out Test environment, when we are trying to run the parallel job fromm designer Or director, both designer and director get freezed( not responding), we have to force fully kill the session, but while running sequence job we are not having issue from director and designer.please can you help us out.

We have already open the PMR with IBM guys they gave few steps to follow but still we are not able to figure it out.

Additional information

1) We are having same issue in all the projects on the that server

2)Running parallel job on single node, but still did not work(designer and director getting hanged for parallel job or sequence job with parallel calling)

3) Created new project and ran the job but still did not work (designer and director getting hanged for parallel job or sequence job with parallel calling)

4)every thing is fine while running sequence job only

Thank you
suneel

Posted: Thu Aug 22, 2013 5:52 pm
by ray.wurlod
Can you run jobs from the command line using dsjob command?

Posted: Thu Aug 22, 2013 6:11 pm
by SURA
1. Is it the first time trying to run the job in this server?
2. Configuration did properly?
3. Create a sample job say RowGen --> Peak and test. This will ensure we are not accessing any file system.
4. Is there is any log entries after the job is terminated ?
5. Check the ISALite (I am not sure in UNIX) report
6. Check the version.xml

Posted: Thu Aug 22, 2013 6:33 pm
by suneelchallagali
Thank you for the replay

@ Ray

I am trying to run dsjob command,let you know how it goes

@ Sura

This is not first we are running the job, till yesterday every thing was fine, we are having issue from today early morning. Even simple parallel job (peek and row generator), not able to run it, as it freezes the designer or director when we run it.

We have also generated the ISAlite stats and gave it to IBM. They said, there are no specific error messages

Version.xml is valid one

Posted: Thu Aug 22, 2013 6:51 pm
by SURA
In your job, please set the following:

DS_PXDEBUG=1 (you will need to create this env var)
APT_STARTUP_STATUS=true
APT_DISABLE_COMBINATION=true
CC_MSG_LEVEL=1

Then force recompile the job and run again.

After you have killed the job, please find any useful info in the job log. And if you go to the project folder you should see a 'debugging' folder.
Inside will be a folder with the job name. Find if that helps you.

Posted: Thu Aug 22, 2013 7:22 pm
by suneelchallagali
@ SURA

After setting the parameters which you have mentioned in the job and when i try to run the job, director is freezing (unresponsive) and i don't see debugging director is been created.

@ Ray
When i ran the Seq job from DS with DSjob command i got following error message
/info_server/IBM/IS85/Server/DSEngine/bin/dsjob -domain ARLSPMWST01:9081 -user <username> -password <Password> -server ARLSPMDST01.CORP.CAT.COM:31539 -run SB3 Test12121_Jobs
Reply=255
Output from command ====>
Error running job

Status code = 30107

Posted: Thu Aug 22, 2013 7:38 pm
by suneelchallagali
Ray,

when i am the running parallel job via DSjob command from the server, it is not doing any thing but for sequence job it is working fine

Posted: Thu Aug 22, 2013 7:41 pm
by suneelchallagali
during running the parallel job from dsjob command aadn at same time when i am trying to open the log for that job, director or designer are getting freezed (unresponsive), so due to that i am not able to view the log.

Posted: Thu Aug 22, 2013 8:45 pm
by ray.wurlod
suneelchallagali wrote:...till yesterday every thing was fine...
What has changed?

Posted: Fri Aug 23, 2013 4:46 am
by priyadarshikunal
also check if there is any entry created in SystemOut.log when you try to run the parallel job.

Posted: Fri Aug 23, 2013 8:07 am
by suneelchallagali
@ Ray

Not sure, what has changed.

@ priyadarshikunal

Please can you tell me location of the systemOut.Log file

Posted: Fri Aug 23, 2013 8:41 am
by suneelchallagali
@ priyadarshikunal

I am dont see much information in SystemOut.log file, apart from login information

Posted: Sun Aug 25, 2013 6:25 pm
by SURA
Hi suneelchallagali

If you haven't, Please bounce the server and try to run the job.

If you have already bounced the server, then ensure that all the necessary services are running for xmeta too.

This is my guess only, but wont harm on trying this!

Posted: Mon Aug 26, 2013 9:43 am
by PaulVL
Are you running on a cluster/grid?
Sequencers run on the Head Node, Jobs run on the Cluster, the rsh/ssh might be the issue.


Look at the user id that is executing the job. There may be something in that login that might be prompting you for data.

What is the LAST log message in your job execution log when the job hangs?

Posted: Wed Sep 04, 2013 5:18 pm
by suneelchallagali
Hi,

After running the trace on issue with help of IBM guys we got temporary solution,issue was file($DS.SIGNAL) under catdir(under DSEngine directory) got locked due to our DSEngine has been mounted to NAS.

Here is the tech notes from IBM guys for more details.

http://www.ibm.com/support/docview.wss?uid=swg21594973

Thank you,
suneel