Page 1 of 1

Deadlock in DB2 z/OS

Posted: Wed Dec 09, 2015 4:17 am
by Palermo
Hi Everyone,

I have a problem with deadlock in DB2 database (z/OS). ETL job consists of 4 DB2 connectors and change capture stage. The table contains about 1,5 million records. Updated, inserted and deleted records are about 150 thousand. The job performs commit after 100 records. The table is not used by other applications. Deadlock occurs randomly, for example, in DB2 connector for updating:
SQLExecute reported: SQLSTATE = 57033: Native Error Code = -913: Msg = [IBM][CLI Driver][DB2] SQL0913N Unsuccessful execution caused by deadlock or timeout. Reason code "00C90088". SQLSTATE=57033 (CC_DB2DBStatement::processRowStatusArray, file CC_DB2DBStatement.cpp, line 2,636)
Could you please help or give advice? Thanks.

The job and the table space for the table (1 table=1 table space):
Image
Image
Image
Image

Posted: Wed Dec 09, 2015 8:23 am
by chulett
This is typically caused by (improper) partitioning, which can be tested by running the job on a single node.

Posted: Wed Dec 09, 2015 11:24 am
by Palermo
Do you mean improper partitioning in the Change capture stage? I used "Hash partitioning" for before and after links. If I rerun the job can finish successfully or fail with deadlock message. That happens once a week and it is difficult to catch. I have no time and I should promote the jobs to production. I tried to reproduce the same deadlock unsuccessfully.

Data in dataset is partitioned and sorted in other job:

Image

Posted: Wed Dec 09, 2015 11:56 am
by chulett
So this job does run on multiple nodes? And you don't have time to test it over time running on a single node before it needs to go to production? Just want to make sure we're all on the same page.

My "partitioning" comment was extremely generic. And would typically be more about the target stages.

Posted: Wed Dec 09, 2015 2:58 pm
by Palermo
The job runs on multiple nodes. I think that the problem is on database side but I have no idea how to catch and reproduce the same situation and run on a single node because if I rerun the job finishes successfully. All the DB2 connectors work in sequential mode.