OCI...a way to show breakout of Upd vs Inserts ?

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

OCI...a way to show breakout of Upd vs Inserts ?

Post by admin »

When using OCI ...update then Insert selection, is there a way to have
the director log
tell you how many were inserts, and how many were updates...rather then just one number of records processed (which is for both) ?

Thank You,

Patricia Desiano
Anheuser-Busch, Inc.
MSG - BudNET Data Warehouse
314-577-7558
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Depending on the number of rows you are talking about you may be better to split the inserts/updates using some sort of cdc. Thereby giving you two links which will contain the number of inserts/updates. Using before and after images in sequential files can help you to do depending on the source extract you are dealing with.

Phil

-----Original Message-----
From: Desiano, Patricia [mailto:Patricia.Desiano@anheuser-busch.com]
Sent: Friday, September 07, 2001 4:39 AM
To: datastage-users-list
Subject: OCI...a way to show breakout of Upd vs Inserts ?


When using OCI ...update then Insert selection, is there a way to
have
the director log
tell you how many were inserts, and how many were updates...rather then just one number of records processed (which is for both) ?

Thank You,

Patricia Desiano
Anheuser-Busch, Inc.
MSG - BudNET Data Warehouse
314-577-7558
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Patricia, there is no facility for most databases to
return that information using the generated SQL by
DataStage. Your only possibility is to write your own
stored procedure to conduct the inserts and updates,
and return that information to DataStage to put into
the log of the job as an information message. Youre
going to have to parse the DataStage job log to get
that information.

Youre really much better off designing an ETL process
that knows already if the row is an insert or update.
You have eliminated the possibility of doing a more
intelligent update of the current row. The Auto-Gend
SQL replaces the entire row. What if your update
should only touch a subset of the columns within the
row? What if the update should look at the existing
value of a column and use that in the derivation of
the replacement value?

By separating inserts from updates, you will capture
those metrics because thats how your process
operates. In addition, you bring the possibility of bulk-loading the inserts, which will turn around the load time on your process. You will also be able to dedicate the SQL to conducting the updates, and straight update SQL is faster than insert else update or vice versa. By having separate links within DataStage tracking the inserts and the updates, its really much easier to extract the link statistic for both.

If youre really focused on performance, your job
should split out the inserts from the updates and
prepare load files. A DataStage job that has mixed transformation and load functionality will run at the slowest speed of those two, usually the loading. If youre inserting else updating, then your whole process is bottlenecked by that SQL. Youll find that preparing separate insert and update load files, the extra processing you gain preparing a hash file of keys (or whole rows) existing in the target table is vastly outweighed by the performance benefit of bulk-loading inserts and mass updating updates, not to mention the better logic for handling updates. The overhead of building a hash file lookup off a join of incoming keys to the target table is nothing compared to the overhead of insert else update. Thus you inherently build a process that shows inserts and updates as separate link paths.

Any questions send me an email. Ive got a great
document of how to build this architecture in
DataStage, plus a bunch of other tips.

Good luck!
-Ken

--- "Desiano, Patricia"
wrote:
> When using OCI ...update then Insert selection,
> is there a way to have
> the director log
> tell you how many were inserts, and how many were
> updates...rather then just
> one number of records processed (which is for both)
> ?
>
> Thank You,
>
> Patricia Desiano
> Anheuser-Busch, Inc.
> MSG - BudNET Data Warehouse
> 314-577-7558
>


__________________________________________________
Do You Yahoo!?
Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
Locked