AIX Unable To Detect DataStage Process

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

AIX Unable To Detect DataStage Process

Post by jerome_rajan »

As part of our control framework, we are trying to design a flow so that a job shouldn't start until the previous job completes. The script we have written is

Code: Select all

while [ `ps -ef|grep "phantom DSD.RUN job_jobName"|grep -iv -e "grep" -e "SH -c"|wc -l` -ne 0  ]; do sleep 2; done
The loop however exits even when the process is actually running. Why could this be happening?
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Is your jobname stored in a variable, or is it hardcoded in the command (as in your example)?

Add -x to the shell executable at the top of your script:

Code: Select all

#!/usr/bin/ksh -x
to enable the shell to trace the execution of your script.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

Not much help. We could still see the job running after the while loop exited :(. Is DataStage refreshing the process because of which the loop exits at that instant? What exactly is causing the shell to miss the presence of the process?
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Why wasn't it (the -x, I assume) much help? -x should have shown you how the commands executed and what the results resolved to, similar to the following:

Code: Select all

++ ps -ef
++ grep 'phantom DSD.RUN job_jobName'
++ grep -iv -e grep -e 'SH -c'
++ wc -l
+ '[' 0 -ne 0 ']'
You see each command that is executed. Do they match your expectations? Does the grep argument exactly match what you see if you do this manually?

Is your jobname stored in a variable, or is it hardcoded in the command (as in your example)?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

The trace output is exactly what it should be. My concern however is that though the loop exits, the job in question is still running in which case, the script should not have exited. The job name is currently hardcoded for debugging.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

did it go to sleep for even a single time?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

Yes it did but random number of times in every run.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Does the command line resolve to a 1 or 0 in the trace?

'[' 1 -ne 0 ']'

or

'[' 0 -ne 0 ']'

If it's resolving to 0, then the grep argument is probably not matching to what ps -ef is actually putting out and that's what you will need to concentrate on. The logic itself works (I can do the same using running processes and it will loop until I kill it), so what you're searching for is not quite right.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

To answer your earlier question: No, the job would not be "refreshing". But that does bring up the characteristics of the job you're testing this with: How long does it run? How are you starting/restarting it when it ends?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

jwiles wrote:Does the command line resolve to a 1 or 0 in the trace?
It resolves to a 0 and consequently exits the loop. But the pertinent question here is why is the process/job in question that the shell presumed completed still running? A ps -eaf immediately after the while loop exits still gives me the DSD.RUN job_Jobname in the list
jwiles wrote:How long does it run? How are you starting/restarting it when it ends?
The original job could run for anywhere between an hour to 2 hours. We created a test job that would run for much lesser time and aid in debugging the issue. The test job was supplied enough data to run for 5-10 mins. It is a parallel job and has been running stand-alone thus far. The job has not been designed to restart after completion.

On that note, I'm beginning to think it's more of an AIX issue. I'm pretty sure that a parallel job wouldn't just break and disappear and then reappear just like that!
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Please paste to us the dsjob -run command that you are using.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

In order to debug this further, I would suggest breaking the command string into its separate commands and storing results in either files or variables so that you can see what is being seen at the time by the commands. Something like the following would be one way of accomplishing this:

Code: Select all

MyWordCount=1
while [ $MyWordCount -ne 0 ]
do
  ps -ef >./psef.out
  MyWordCount=`grep "phantom DSD.RUN job_jobName" ./psef.out|grep -  iv -e "grep" -e "SH -c"|wc -l`
done
echo "ps -ef output at time of death:"
cat ./psef.out
This will place the output of the ps -ef command into a file, which you can then examine when the script exits to see if ps's output is the trigger. You can break the command string down further as necessary.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

One thing to remember is that a LONG "ps -ef" line may get truncated due to your default COULMNS setting for your shell and you might not be able to grep for the exact job name because of that issue.
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

Thanks James & Paul. Will try your suggestions and report in a while. Paul's thought seems reasonable. I had initially thought about the page display limit affecting the grep output (which of course I found was actually not)
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I mention it because I ran into that problem with the Grid enablement toolkit (a few releases ago). They also parsed the PS line and were limited to the COLUMNS of the shell script that was used to bounce the datastage engine (ya it was that tricky).

So, it was a leson learned for me to expand the PS statement fully if I want to grep / parsing something from it.

On Linux (suse) I would do a "ps -efww | grep ..." to get the full wide ps text. the WW option is not supported on AIX but something equivalent is out there.
Post Reply