Looking for design help

abc123 · Post by **abc123** » Fri Sep 17, 2010 2:50 pm

My source dataset has duplicates on the key column, Col1. I need to insert the first one in a duplicate set as an insert and all subsequent ones as updates. I am using Oracle Connector stage. I set recordcount to 1 and I am using a single node config file. This works but my data volume is large and I am wondering if there is another strategy that anyone can suggest.

Thanks.

elsont · Post by **elsont** » Sun Sep 19, 2010 10:34 am

Hi,

You can use multi node config file if table is having row level locking. In this case use the hash partition on the key. My suggestion is to avoid hitting database multipile times for the same record again and again. try to create a single record(unique key) instead of hitting table again and again that will give more performance.
If your table is having page level lock, then you may not be able to hit the table with multi node (may create deadlock). but if you can separate inserts and updates, then you ccan run insert in parallel and updates sequentially after inserts are done.

ArndW · Post by **ArndW** » Mon Sep 20, 2010 3:27 am

Use a sort stage to sort your incoming data on KEY and create a key change column. In the subsequent transform change, output all records where the key change column is true to the DB "INSERT" link and the others to the DB "UPDATE" link. Both DB stage should have the appropriate insert or upsert SQL.

abc123 · Post by **abc123** » Mon Sep 20, 2010 12:05 pm

Arndw, so you are suggesting that all inserts happen at a time whereas all updates happen sequentially, one after one (RecordCount=1)?

DSXchange

Looking for design help

Looking for design help

Re: Looking for design help