Page 1 of 1

Parallel update to DB2

Posted: Thu May 26, 2005 8:06 am
by ewartpm
I'm reading a sequential file, passing it thru a link splitter and generating 4 streams into a DB2 CLI stage.

The DB2 stage does an 'update existing rows only' and each input stream is updating the same table. I'm trying this in an attempt to speed up the update process.

However, I keep on getting unique key violations from the database. It's as though DB2 is doing an Insert. :x

Any help on this one would be appreciated.

Posted: Thu May 26, 2005 11:46 am
by Sainath.Srinivasan
Check whether you are changing any key columns as part of update.

Posted: Fri May 27, 2005 4:29 am
by ewartpm
I am updating some key columns. However, the "Where" statement refers to the primary key only.

Does DB2 not allow updating of non primary key columns :?:

Posted: Fri May 27, 2005 6:29 am
by chulett
The 'where' clause refers to whichever columns you have marked as Keys - so regardless of the actual keys in the table, only mark the columns you want to be used in the where clause for this statement. All of the columns not marked as keys will be updated.

Make sure you've set the Update Action on all four of the Input links to the DB2 stage. Classic problem is to only set the 'first' one and leave the rest at the default action - Insert.

Unique Key violations don't just come from inserts! If you are updating primary key columns (or values in any unique index) and the value(s) you are trying to set the row to already exist in another record you will generate that error.

Posted: Fri May 27, 2005 7:42 am
by ray.wurlod
I've always thought of it like this.
In any SQL, columns referred to in a WHERE clause refer to the search (constraining, filtering) keys (as opposed to the primary keys).

Posted: Mon May 30, 2005 2:55 am
by ewartpm
Thanks guys. I fiddled with the key columns and now it works.

Interesting though, I tried the job using the splitter and then created another job using the IPC stage without the splitter. The IPC stage beats the splitter every time.

Posted: Mon May 30, 2005 5:48 pm
by vmcburney
A splitter will not do much for your job and may be making it slower because the entire job is still only running on a single CPU. To get the most from parallel processing you need to make it a multiple instance job and run your four streams as different instances of the job that partition the data.

Posted: Tue May 31, 2005 4:56 am
by ewartpm
Hi Vincent

I'm not sure I understand what you mean when you say the splitter will only run one cpu. From my understanding, each link out of the splitter will force a new 'thread' which, in turn, will mean the cpu's will be better utilised :? .

If this is not the case, then from a performance enhancing viewpoint, is the splitter worth anything?

Posted: Tue May 31, 2005 7:03 am
by chulett
I don't believe the Row Splitter stage has anything to do with performance improvement. At least not directly.