My source dataset has duplicates on the key column, Col1. I need to insert the first one in a duplicate set as an insert and all subsequent ones as updates. I am using Oracle Connector stage. I set recordcount to 1 and I am using a single node config file. This works but my data volume is large and I am wondering if there is another strategy that anyone can suggest.
Thanks.
Looking for design help
Moderators: chulett, rschirm, roy
Re: Looking for design help
Hi,
You can use multi node config file if table is having row level locking. In this case use the hash partition on the key. My suggestion is to avoid hitting database multipile times for the same record again and again. try to create a single record(unique key) instead of hitting table again and again that will give more performance.
If your table is having page level lock, then you may not be able to hit the table with multi node (may create deadlock). but if you can separate inserts and updates, then you ccan run insert in parallel and updates sequentially after inserts are done.
You can use multi node config file if table is having row level locking. In this case use the hash partition on the key. My suggestion is to avoid hitting database multipile times for the same record again and again. try to create a single record(unique key) instead of hitting table again and again that will give more performance.
If your table is having page level lock, then you may not be able to hit the table with multi node (may create deadlock). but if you can separate inserts and updates, then you ccan run insert in parallel and updates sequentially after inserts are done.
Use a sort stage to sort your incoming data on KEY and create a key change column. In the subsequent transform change, output all records where the key change column is true to the DB "INSERT" link and the others to the DB "UPDATE" link. Both DB stage should have the appropriate insert or upsert SQL.