Problem with delete using DB2 enterprise stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Problem with delete using DB2 enterprise stage

Post by vigneshra »

Hi

We have an issue in a parallel job that purges the records from a DB2 UDB table. We extract key columns from table1 and deleting records from table2. table1 and table2 have the same set of key columns. When the job is run, it is not deleting all the records in table2. It means that for each run, it is deleting some records and on rerun, its deleting some more but not everything in one go. The array size and transaction size are set to 50000. The target DB2 stage is executed in parallel and the partitioning is set to DB2. Job is running perfectly alright in development environment but not in production. So we want to identify the rootcause of this issue. Any help is appreciated.
Vignesh.

"A conclusion is simply the place where you got tired of thinking."
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

If you delete that many records, the transaction log size is going to be full.
But that should give you an error, because you did not get any error you may not have this problem right now.Make sure you are not logging these delete activities.
Kris

Where's the "Any" key?-Homer Simpson
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

If you want to replace the entire table it would be better NOT to use Delete (or Truncate) with DB2 in DataStage, because a "Delete *" will be passed to the database and the log files will fill up.

Faced with a similar situation, I wrote a script to create an empty file in Unix and then Replace the database table, truncating all the records but leaving column names and data types intact. I also had a second script that I used to restore referential integrity to the multiple database tables.

You would create a custom routine that allows the user to enter the database name, schema and table. Then in a sequence job, use this routine to truncate your table before loading it. No need for a Delete *, no log files get filled and the performance will be a lot quicker.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Great tip Diddy.

You should also have a look at capturing your sql codes from the failed delete statements. I am afraid you are suffering from row leakage, follow the link for instructions on trapping error messages from your DB2 enterprise stage. You might fix this one problem but you still might have rows leaking out of your other DB2 enterprise stages.
bakul
Participant
Posts: 60
Joined: Wed Nov 10, 2004 2:12 am

Post by bakul »

What are the options when you want to delete based on a key, say date, and not truncate the entire table?
Selecting the rows from the database first and then using the keys to delete is one. But wouldn't that lead to select of millions of rows from the table, landing to a dataset and then firing the delete query when the delete query does not require these keys and could just as well work with the Date key?
Regards,
Bakul
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Bakul, If I understand, you need to delete on date basis. How big is the table, ie., how many days of data? If the data that you tend is delete is less when compared to the exisinting data, DELETE FROM <> WHERE will work. May be if you are concert about transaction, you can delete on some chunk with help of loop in SP.
If the data data you tend to delete is about more that 50%, you can insert the rest of the rows into another table, and truncate the current table.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
bakul
Participant
Posts: 60
Joined: Wed Nov 10, 2004 2:12 am

Post by bakul »

Thanks for the reply Kumar.
The data that I need to delete is not 50% of the table but it still causes the transaction log to overflow.

Deleting chunks with loops seems the preferred approach.
Could you please explain what you mean by "delete on some chunk with help of loop in SP. " ? Am not sure what you mean by SP. :?
Regards,
Bakul
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

I just said that, rather than deleting the whole bunch of rows at a go, you can delete say 5000 or 10,000 rows per transaction. So that you can avoid the transaction log to not get full. You can write a stored procedure and loop it till you get no data found for a given condition. And commit after every 5K or 10K transaction. If you want to do in Datastage, try DELETE and try setting Transaction size to a higher value.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply