Load job as a reusable design

senthilkumarr · Post by **senthilkumarr** » Thu Jun 23, 2011 12:05 am

Hi Ray,

Suppose after capturing the insert/update records using CDC in One dataset in first job.

Can i use the dataset as RCP in second job to load into SQL server (ODBC COnnection).I like to use the load job as a reusable design for all 87 table load and also i am facing issue with BLOB data type loading in SQL server.

chulett · Post by **chulett** » Thu Jun 23, 2011 12:17 am

... and off you go to your own topic.

Split from this post if anyone is curious.

mhester · Post by **mhester** » Thu Jun 23, 2011 6:14 am

The split in threads seems to actually be a logical one - 2 separate questions really.

To answer this question reusable load jobs are perfect for what you want to do. You will need to do a few things in the load job and you will actually probably have three (3) separate load jobs (makes more sense). One will delete, one will update and one will insert.

The insert job is a no brainer - no keys etc... involved so this can be coded quite quickly and requires no key exposure - simply stream the data.

The update and delete are a bit trickier, but not terribly so. The caveat with these is that you need a key and there are a couple of ways to do this. The easiest way is to supply the "WHERE" clause directly to the stage via a job parameter (very common). Another way to do this would be to pass a parameter with a schema file that defines the key for the particular table you are updating or deleting. In your update and delete job you will expose some generic column(s) as the key columns using the column export operator. This key could then be used in a dynamic SQL statement in the stage (user defined of course).

Below I will give you an example of what to do if you will be passing the WHERE clause and not exposing keys.

You will choose not to generate the SQL in the stage and you can then parameterize the SQL statement for your needs. You could have something similar to -

Code: Select all

DELETE FROM #TABLE_NAME_PARM# WHERE #DELETE_CLAUSE#

You will supply the parms and the DELETE_CLAUSE might look something like

Code: Select all

ACTUAL_TABLE_KEY=ORCHESTRATE.ACTUAL_TABLE_KEY

Any method you choose will have pros and cons. Fully RCP designs can be more difficult to debug and reviewing the data can be challenging (not too bad, but challenging). Exposing generic keys or writing specific WHERE clauses both have merit and you need to decide which is better for your situation.

This type of job will be multi instance and the number you can run is entirely dependent on system resources as Ray pointed out in the other thread.

There are ways to wrap database operators so that you can actually do update and deletes within the same job (without job parms etc...), but I prefer to separate them for ease of maintenance.

Have fun!

RCP is great for things like sourcing and loading, but don't go wild trying to make everything generic - too much of a hassle and very hard to maintain in my opinion.

highpoint · Post by **highpoint** » Wed Sep 07, 2011 8:09 am

mhester wrote:Another way to do this would be to pass a parameter with a schema file that defines the key for the particular table you are updating or deleting. In your update and delete job you will expose some generic column(s) as the key columns using the column export operator. This key could then be used in a dynamic SQL statement in the stage (user defined of course).

Hi,

I was trying to implement Generic Update Job, and was trying to understand the above procedure.

I am not able to understand how to implement generic update using column export operator. Can you please elaborate on this.

for UPDATE statement we would need to SET fields other than specifying key fields in where clause. For DELETE it seems to work if we could some way specify only the key fields. But for UPDATE, we need to have fields in SET clause as well.

ray.wurlod · Post by **ray.wurlod** » Wed Sep 07, 2011 5:38 pm

Your new mantra: NOT EVERYTHING CAN BE DONE GENERICALLY