As part of our control framework, we are trying to design a flow so that a job shouldn't start until the previous job completes. The script we have written is
Not much help. We could still see the job running after the while loop exited . Is DataStage refreshing the process because of which the loop exits at that instant? What exactly is causing the shell to miss the presence of the process?
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Why wasn't it (the -x, I assume) much help? -x should have shown you how the commands executed and what the results resolved to, similar to the following:
The trace output is exactly what it should be. My concern however is that though the loop exits, the job in question is still running in which case, the script should not have exited. The job name is currently hardcoded for debugging.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Does the command line resolve to a 1 or 0 in the trace?
'[' 1 -ne 0 ']'
or
'[' 0 -ne 0 ']'
If it's resolving to 0, then the grep argument is probably not matching to what ps -ef is actually putting out and that's what you will need to concentrate on. The logic itself works (I can do the same using running processes and it will loop until I kill it), so what you're searching for is not quite right.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
To answer your earlier question: No, the job would not be "refreshing". But that does bring up the characteristics of the job you're testing this with: How long does it run? How are you starting/restarting it when it ends?
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
jwiles wrote:Does the command line resolve to a 1 or 0 in the trace?
It resolves to a 0 and consequently exits the loop. But the pertinent question here is why is the process/job in question that the shell presumed completed still running? A ps -eaf immediately after the while loop exits still gives me the DSD.RUN job_Jobname in the list
jwiles wrote:How long does it run? How are you starting/restarting it when it ends?
The original job could run for anywhere between an hour to 2 hours. We created a test job that would run for much lesser time and aid in debugging the issue. The test job was supplied enough data to run for 5-10 mins. It is a parallel job and has been running stand-alone thus far. The job has not been designed to restart after completion.
On that note, I'm beginning to think it's more of an AIX issue. I'm pretty sure that a parallel job wouldn't just break and disappear and then reappear just like that!
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
In order to debug this further, I would suggest breaking the command string into its separate commands and storing results in either files or variables so that you can see what is being seen at the time by the commands. Something like the following would be one way of accomplishing this:
MyWordCount=1
while [ $MyWordCount -ne 0 ]
do
ps -ef >./psef.out
MyWordCount=`grep "phantom DSD.RUN job_jobName" ./psef.out|grep - iv -e "grep" -e "SH -c"|wc -l`
done
echo "ps -ef output at time of death:"
cat ./psef.out
This will place the output of the ps -ef command into a file, which you can then examine when the script exits to see if ps's output is the trigger. You can break the command string down further as necessary.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
One thing to remember is that a LONG "ps -ef" line may get truncated due to your default COULMNS setting for your shell and you might not be able to grep for the exact job name because of that issue.
Thanks James & Paul. Will try your suggestions and report in a while. Paul's thought seems reasonable. I had initially thought about the page display limit affecting the grep output (which of course I found was actually not)
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
I mention it because I ran into that problem with the Grid enablement toolkit (a few releases ago). They also parsed the PS line and were limited to the COLUMNS of the shell script that was used to bounce the datastage engine (ya it was that tricky).
So, it was a leson learned for me to expand the PS statement fully if I want to grep / parsing something from it.
On Linux (suse) I would do a "ps -efww | grep ..." to get the full wide ps text. the WW option is not supported on AIX but something equivalent is out there.