Parallel update to DB2

ewartpm · Post by **ewartpm** » Thu May 26, 2005 8:06 am

I'm reading a sequential file, passing it thru a link splitter and generating 4 streams into a DB2 CLI stage.

The DB2 stage does an 'update existing rows only' and each input stream is updating the same table. I'm trying this in an attempt to speed up the update process.

However, I keep on getting unique key violations from the database. It's as though DB2 is doing an Insert.

Any help on this one would be appreciated.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu May 26, 2005 11:46 am

Check whether you are changing any key columns as part of update.

ewartpm · Post by **ewartpm** » Fri May 27, 2005 4:29 am

I am updating some key columns. However, the "Where" statement refers to the primary key only.

Does DB2 not allow updating of non primary key columns

chulett · Post by **chulett** » Fri May 27, 2005 6:29 am

The 'where' clause refers to whichever columns you have marked as Keys - so regardless of the actual keys in the table, only mark the columns you want to be used in the where clause for this statement. All of the columns not marked as keys will be updated.

Make sure you've set the Update Action on all four of the Input links to the DB2 stage. Classic problem is to only set the 'first' one and leave the rest at the default action - Insert.

Unique Key violations don't just come from inserts! If you are updating primary key columns (or values in any unique index) and the value(s) you are trying to set the row to already exist in another record you will generate that error.

ray.wurlod · Post by **ray.wurlod** » Fri May 27, 2005 7:42 am

I've always thought of it like this.
In any SQL, columns referred to in a WHERE clause refer to the search (constraining, filtering) keys (as opposed to the primary keys).

ewartpm · Post by **ewartpm** » Mon May 30, 2005 2:55 am

Thanks guys. I fiddled with the key columns and now it works.

Interesting though, I tried the job using the splitter and then created another job using the IPC stage without the splitter. The IPC stage beats the splitter every time.

vmcburney · Post by **vmcburney** » Mon May 30, 2005 5:48 pm

A splitter will not do much for your job and may be making it slower because the entire job is still only running on a single CPU. To get the most from parallel processing you need to make it a multiple instance job and run your four streams as different instances of the job that partition the data.

ewartpm · Post by **ewartpm** » Tue May 31, 2005 4:56 am

Hi Vincent

I'm not sure I understand what you mean when you say the splitter will only run one cpu. From my understanding, each link out of the splitter will force a new 'thread' which, in turn, will mean the cpu's will be better utilised

.

If this is not the case, then from a performance enhancing viewpoint, is the splitter worth anything?

chulett · Post by **chulett** » Tue May 31, 2005 7:03 am

I don't believe the Row Splitter stage has anything to do with performance improvement. At least not directly.