Okay, the basic problem is that two out of six jobs in a particular DataStage project are hanging. Below are as many details as I can provide:
Originally our DataStage Server was running v6x on a filesystem. The jobs were running fine. We migrated our jobs to a v7.1 server that pointed to this same filesystem. Unfortunately, we did not deactivate [through crontab] the jobs on the older v6 server. Both jobs kicked off at the same time, and both pointed to the same filesystem [including feed files, hash files, etc.] I understand that this created contentions between the two systems, so neither could run. I managed to free up most resources and locks, but there were still some that DataStage director said were being used in the v6 system. Before I had a chance to clean it up, the system was brought down.
Now, the jobs are no longer getting hash file errors, but seem to be getting to a point in the job design, and just hang for hours on end. We are running Oracle 9i, 32-bit client. My first thought was that this was caused by row-locks on tables. I checked with a DBA who verified that the only processes running were my own, and the only locks were generated by the DataState job. Curiously though, it appeared that a particular update statement was being called twice. I ran debugger on these jobs, and indeed, when it got to this update statement (simple, three value, auto-generated statement) the job appears to be hung. A similar circumstance occurs with another job (on a different table)
The strangest thing for me is that we have multiple environments where these jobs run. In these environments, the servers are running v6x and the exact same jobs run fine, on the exact same data files. [just to be certain, I did an extract of the jobs/executables from one and reimported it into 7x. Still gets hung]
I was wondering if anyone else has ever run into a similar issue. I found a few topics on jobs getting hung, but it didn't appear to be the same behavior. Please let me know if you have any suggestions - I'm willing to try anything at this point.
Thanks,
-Sean
DataStage job works in some environments, not others.
Moderators: chulett, rschirm, roy
Re: DataStage job works in some environments, not others.
Hopefully you have export files of all your DataStage files, then roll back your changes. You should never do an upgrade without a proper full system backup (DataStage shut down) and export of all of your DataStage jobs.
My best hunch is that your DataStage server is currently in an illegal state (half v6/half v7) and there's no real way out but to rollback or reinstall.
What I've sometimes seen (but probably not in your case) is that although a job is stopped according to director the corresponding phantom of a previous run is still active and writing to the database.
Try on the UNIX command line:
and see what that gives... followed by a kill -15 of the pid of the phantom (if you find multiple running using the same name).
Ogmios
P.S. Never switch between V7 and V6. It's a one way street from v6 to v7. And completely shutdown the v6 server before doing the upgrade (rename e.g. the main Ascential directory) and never start the v6 server afterwards.
My best hunch is that your DataStage server is currently in an illegal state (half v6/half v7) and there's no real way out but to rollback or reinstall.
What I've sometimes seen (but probably not in your case) is that although a job is stopped according to director the corresponding phantom of a previous run is still active and writing to the database.
Try on the UNIX command line:
Code: Select all
ps -ef | grep phantom | grep YOURJOBNAME
Ogmios
P.S. Never switch between V7 and V6. It's a one way street from v6 to v7. And completely shutdown the v6 server before doing the upgrade (rename e.g. the main Ascential directory) and never start the v6 server afterwards.
Re: DataStage job works in some environments, not others.
Ogmios,
Thanks for the advice, but unfortunately it doesn't fix the problem. Yes, this is something I was aware of as well. When one of the jobs in question is hung and must be aborted, there is a process in the background still connected to the DB. I have verified with the DBA though, that once this is killed, the DB locks/resources are freed. Then I try to run again and the same thing happens.
Any other thoughts?
-Sean
Thanks for the advice, but unfortunately it doesn't fix the problem. Yes, this is something I was aware of as well. When one of the jobs in question is hung and must be aborted, there is a process in the background still connected to the DB. I have verified with the DBA though, that once this is killed, the DB locks/resources are freed. Then I try to run again and the same thing happens.
Any other thoughts?
-Sean
Believe me, I'm in agreement. We're not doing any kind of backward switching from V7 to V6.. it's always in the forward direction. Unfortunately I don't have any kind of admin access to these machines and the upgrade was a decision made above me. What's done is done; now I just have to make it work.ogmios wrote:P.S. Never switch between V7 and V6. It's a one way street from v6 to v7. And completely shutdown the v6 before doing the upgrade (rename e.g. the main Ascential directory) and never start it up anymore afterwards
Sean,
Maybe you can save as one of the jobs. Remove the relational stage and replace it with a sequential stage to truly rule out that the DB is/is not causing the problem. If it runs then you probably have a DB problem and if it does not then further research is necessary.
Regards,
Michael Hester
Maybe you can save as one of the jobs. Remove the relational stage and replace it with a sequential stage to truly rule out that the DB is/is not causing the problem. If it runs then you probably have a DB problem and if it does not then further research is necessary.
Regards,
Michael Hester
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
Michael,mhester wrote:Sean,
Maybe you can save as one of the jobs. Remove the relational stage and replace it with a sequential stage to truly rule out that the DB is/is not causing the problem. If it runs then you probably have a DB problem and if it does not then further research is necessary.
Regards,
Michael Hester
Yes, I tried this. For Job #1, the job runs OK after transforming the link to a sequential file instead of Oracle stage. I am going to see if I can get access to run this query through sqlplus to verify if it is the DB causing a problem here or not.
However, for Job #2, the statement that it is currently getting hung at, is exactly the same SQL statement that runs successfully many times per record in Job #1. (A simple Insert statement) This is why I'm confused because that portion doesn't seem to be DB related at all.
I'm still researching. If anyone else has any additional suggestions I will try them out where I can.
Thanks,
-Sean