job hangs

praveenchandra · Post by **praveenchandra** » Tue Sep 26, 2006 9:19 am

I have a job running in production, just did not finishyesterday, when I monitor the job, 0 rows were retrieved in wave#7 and nothing happened after that. Stopped the job in director, ran again, still 0 rows retrieved.
Tried to cleanup resources, nothing found.
any pointers highly appreciated.
thanks in advance
praveen

meena · Post by **meena** » Tue Sep 26, 2006 9:29 am

Hi,
Can you explain more about the sequencer or the job. What exactly it is doing?More in detail.

praveenchandra · Post by **praveenchandra** » Tue Sep 26, 2006 10:18 am

Thanks for your reply

this is a job, very simple, with 8 hash files and one sequence file to read from, 3 stage transformer and writes to a seq. file and database table.
I do not know if this is sufficient, but if you ask specific question, I can answer.

thanks in advance
praveen

kris007 · Post by **kris007** » Tue Sep 26, 2006 10:56 am

How long has this job been running successfully in the production environment? Did other jobs in the production environment finish successfully? If so, has anything changed within the job recently or any changes on the source files or any changes to the jobs which created the hashed files? Something should have changed else your server is really slow? How long did you wait before you stopped the job and ran it again?

praveenchandra · Post by **praveenchandra** » Tue Sep 26, 2006 11:37 am

Kris,
there is no known change on this job
looks like the job was killed during last run ( based on log )
the job ran for 8 hours with 0 rows retrieved on all hash files.
In designer, I can open the hash file to see the data and I can see data for each of the source hash files and sequence file.
I think there is some thing blocking the execution, I tried to clear the jobs in administrator--> list.readu, then UNLOCK USER 1234 ALL
it pops a new window, says clearling job/user locks, but nothing is actually clearted.

thanks in advance
praveen

ray.wurlod · Post by **ray.wurlod** » Tue Sep 26, 2006 3:34 pm

There are two numbers associated with a UNIX DataStage process - the internal DataStage number and the process ID. Which did you choose for your UNLOCK command?

praveenchandra · Post by **praveenchandra** » Tue Sep 26, 2006 3:56 pm

Ray,
I used the userno ( a column by same name, middle column )
I think I found out the problem.

There is a key generator routine which uses a hash file
this hash file contains two columns, key and value, key is equivalent of
sequence name and value is current value of the sequence.
This was a Type 30 dynamic hash file, a ls on directory SDKSequences ( the hash key name ) would show three files .Type30,
and then group and overflow file.
somehow, instead of a file , there are several files in the directory with
same name as of the keys(sequence name ).
I do not know how this happened.
Possible solution : --drop the hash file using admininstrator,
create a new hash file

Any other suggestion highly appreciated.
thanks in advance

ray.wurlod · Post by **ray.wurlod** » Tue Sep 26, 2006 7:39 pm

Inadvertently, perhaps, you've stumbled on the cause and correct solution. If someone puts any other file in a directory that is a dynamic hashed file (other than DATA.30, OVER.30 and .Type30) it ceases to be a dynamic hashed file and reverts to being a regular directory (known to DataStage BASIC as a Type 19 file). The rules change when this happens; in particular, the locking and updating rules, as you encountered.

It's very interesting that SDKSequences was a Type 30 (dynamic) hashed file; the SDK routines create it as a static hashed file (Type 2, if I recall correctly). There's no reason for it not to be dynamic; but it means that someone has been experimenting.

praveenchandra · Post by **praveenchandra** » Wed Sep 27, 2006 11:53 am

Thanks to everyone for help.
This is (I think ) what happened

Someone / somehow the hashed file got corrupted, either type was changed or something else lead to

1. having multiple files in the SDKSequences ( generated by account ) hashed file directory. D_SDKSequences was 0 bytes.

2. could not remove the SDKsequences from administrator -- tried this
DELETE.FILE SDKSequences
error message
can not open D_SDKSequences

3. removed the D_SDKSequences file and SDKSequences dir
4. updated the job to delete and then create file
delete did not give any errors
but create error -- SDKSequences already exists in VOC

5. created a new job, created hash file with name SDKSeq2
6. Ran the job , check in OS if D_SDKSeq2 and SDKSeq2 are created
7. rename the os file and directory to SDKSequences and D_SDKSequences
8 ran administrator, DELETE.FILE SDKSequences worked
9. ran the job step2
new SDKSequences is created.
10. Ran the original job that was failing, it ran successfully

Hope this helps.
thanks
Praveen

DSXchange

job hangs

job hangs

hanging job

job retreives no rows

update on hanging job

problem resolved