80004005 error now occuring. Looking for ideas...

RodBarnes · Post by **RodBarnes** » Mon Jun 05, 2006 10:34 am

Our production ETL has begun aborting while scheduled. I can run the same ETL manualy and it succeeds without error. It has been running for months without issue and no changes have been made since long before this began occurring. So I am confident this is not a problem with Ascential or the ETL -- but clearly something has changed. It aborts in the same job each time, although the 3rd run aborted in a different job, while still to the same server. It has been doing this for about two weeks now.

Here's what I've already done and going to do:

I have run SQL Profiler on the source system (it is a job extracting data from this server in which it aborts). The log gives no indication that it even knew an attempt was made. This is what I actually expected given the 80004005 genernal connection failure.

I am going to try scheduling it at other times to see if there is some time-related cause.

Other ideas and suggestions of how to get additional info that may help me identify the cause is appreciated.

Thanks.

DSguru2B · Post by **DSguru2B** » Mon Jun 05, 2006 11:14 am

If you try to google the error code. It will bring you results pointing towards a few things.
1)Connection not available
2)login info incorrect.

Try to investigate into that. you are right. Its not a DS problem but rather a database issue.

RodBarnes · Post by **RodBarnes** » Mon Jun 05, 2006 11:30 am

The 80004005 means that the server/database wasn't reachable. It is a catch-all error in that, whatever doesn't fit into one of the specific errors (e.g., 80040e4d login Failed for User '(Null)', 80040e09 access denied, etc.) ends up with an 80004005. Basically, it means "we could not connect and we haven't a clue why not."

Unfortunatley, I'm more familiar with this error than I would like to be.

I was just hoping someone had some secret trick that would help me look deeper into what is happening with DataStage and maybe that would give me a clue as to what is preventing the connection. Since the connection never gets made the server can't really tell me why it didn't succeed.

Thanks for the reply.

DSguru2B · Post by **DSguru2B** » Mon Jun 05, 2006 11:34 am

If the connection is not available from the database itself then there is not much that DataStage can do.
You have mention that "Our production ETL has begun aborting while scheduled". A continous tense phrase. So that means it has been aborting during the scheduled time more than once. Have you spoken to your DBA whether the server was available at that time or not.

RodBarnes · Post by **RodBarnes** » Mon Jun 05, 2006 11:42 am

I am an admin for both the source server and the DataStage server so I have already confirmed that it is available.

There are more recent releases of this same ETL which are running on a different DataStage server (our dev/test server), though scheduled to run about an hour later in the morning. The dev and test ETL continue to succeed without any issues while the production ETL (running on the production server) fails. The job that aborts in the scheduled production run is exactly the same as the corresponding job that runs in the dev and test ETLs -- no changes have been made to that job. Yet both dev and test succeed without error and production fails.

I just rebooted the production server and have scheduled the ETL to run in the next while. So, we'll see if that makes any difference.

I really hate things that just stop working for no apparent reason.

I do appreciate the input.

DSguru2B · Post by **DSguru2B** » Mon Jun 05, 2006 11:45 am

How are you controlling the process. Is it via a Master Control Sequence or by a batch job. How are you providing the parameters. Look into the parameter file to make sure it has the most current username and password. Go into the job log and check and verify if the user name and password passed onto the jobs is correct or not.

RodBarnes · Post by **RodBarnes** » Mon Jun 05, 2006 11:52 am

The ETL is controlled by a master sequence and job parameters are passed from there on down to the main sequence and other jobs and stages. The parameter values have not changed (they were entered during the scheduling of the job months ago). Other jobs within this ETL that access the same server succeed while this one does not. Although, I did have it abort once (the 3rd time) on a different job. All other times it has aborted on the same job.

Although, you do give me something to think about. I wonder if somehow this particular job has lost its ability to receive the parameters and so fails? As mentioned earlier, it works fine when I run it manually.

I'm curious to see what happens with the run I just scheduled. Since I just reentered the parameters (most are defaulted) it will be interesting to see if it works or not. Maybe all I need to do is reschedule the ETL.

We'll see.... Thanks.

ray.wurlod · Post by **ray.wurlod** » Mon Jun 05, 2006 4:28 pm

Has "someone" changed the password for the database and not updated the job sequence parameter value?

kris007 · Post by **kris007** » Mon Jun 05, 2006 4:34 pm

ray.wurlod wrote:Has "someone" changed the password for the database and not updated the job sequence parameter value?

In that case, wouldn't it fail when run manually?

As per OP, Jobs were finished successfully when ran manually, but they failed only when scheduled.

DSguru2B · Post by **DSguru2B** » Mon Jun 05, 2006 4:38 pm

kris007 wrote:
In that case, wouldn't it fail when run manually?
As per OP, Jobs were finished successfully when ran manually, but they failed only when scheduled.

Not nessecarily. The individual job can have its own parameters hard coded by the developer. There can be Human mistakes

kris007 · Post by **kris007** » Mon Jun 05, 2006 4:43 pm

Yes, thats right. But, didn't the OP mention that he was able to run the Master sequence successfully when he ran it manually?

And Since he also claims that it has been happening for a while now, somehow I feel that it's not the case.

RodBarnes · Post by **RodBarnes** » Mon Jun 05, 2006 4:44 pm

As mentioned in my last post, the scheduled parameters are the same ones that have been running successfully for months. And they are the same entered/default parameters used when I run it manually. There are only three entered parameters; the rest are all by defaults.

A follow-up: The scheduled ETL I did today, ran successfully to completion. I have rescheduled the ETL for 10 minutes later (2:10am rather than 2:00am) to see if the rescheduling will cause it to now run successfully tomorrow morning.

Interesting oddity.

RodBarnes · Post by **RodBarnes** » Thu Jun 08, 2006 9:19 am

Just wanted to wrap this thread up...

I still don't know why -- can't seem to find any cause -- but it turns out to be something related to the time. I moved the schedule time from 2:00am to 2:10am and it succeeds without issue. I've been poring through the logs and have yet to see anything that tells me why access by this job to this server fails at that time of the morning.

BUT, I learned a lesson: Don't discount the time as a non-issue. Next time, I will try rescheduling it sooner. Just never guessed that would be the problem.

DSguru2B · Post by **DSguru2B** » Thu Jun 08, 2006 9:38 am

Thanks for the update. Even i was wondering wether you were able to solve this or not.
So a ten minute difference did the trick huh

But stuff like this doesnt happen without a reason. Maybe, in your spare time, go int &PH& directory and try to see what messages were created by the stages in that particular job around that time. I dont know just a little bit of extra research.
Well i am glad you got through the issue.
Guess the server was on a tea break at 2

ray.wurlod · Post by **ray.wurlod** » Thu Jun 08, 2006 4:15 pm

What else was happening in the database at 2:00am? In another project I was caught by something like this - "they" ran a series of batch updates about which they'd neglected to inform us. DataStage job just hung waiting for table-level locks to be released.