Page 1 of 1

Optimization and Performance (Wasteful Processing War)

Posted: Wed May 09, 2007 4:40 am
by Nisusmage
Needless and Wasteful processing should avoided at all costs. I'm sure we all agree with that.

Now, with that in mind, I have quite a large data amount of Data to move into a Data Warehouse environment. After the first Bulk load of all the current data, I don't want to be wasteful in dumping data, cleaning out, and reloading.

I want to discuss the best strategy for continual data loading.

The strategy I hoping to implement is the following:
1) Bulk Load Once Off. Obviously for the initial staring point of data.
2) Incremental Nighly Loads of the days new records or updated records that have changed, using a "DateChange" field strategy, since the last load.
The floor is Open for discussion.

Posted: Wed May 09, 2007 7:23 am
by DSguru2B
Whats there to discuss, you have the strategy right there as your second point. Use the process date to extract only the changed records. Pass it through to your datawarehouse. Now if these daily changes are going to be huge ( in millions) then you can bulk load the changes to a work table. Pass a sql join to identify the inserts and updates. If you are on oracle you can pass a MERGE statement. You will also find a few more discussions here and there.
No matter what path you go, utilize the bulk loader whereever you can. A very senior guy, Ken, says; "If you aint bulk loading, you aint datawarehousing" :wink:

Posted: Wed May 09, 2007 7:48 am
by Nisusmage
I was hope for some critics to say no no no this is a much better way. But I can't really think of a better way.

The only task I have is getting guys that handle the source data to implement my suggested changes. The prob is that they work with Highly Normalized OLTP environment, in contrast, I'm going to be working with a fairly De-Normailized OLAP environment.

Thanks for the confidence boost for my path..

Posted: Wed May 09, 2007 8:52 am
by Nisusmage
The problem I was having with Datastage was the stages that support "Update or Insert" properties have Very Bad performance. So in the light of this topic's subject, what is the best way to implement the incremental nightly changes?

Note: I'm loading data from Progress(ODBC) and pushing to IQ(IQ12Load).

Possible solutions:

Code: Select all

(A)
1) Pull all the records from the source with the "DateChanged" parameter.
2) Drive the data throught a Transformer with a lookup on the target data's PK field(s) and 
      a. if the record exists push the records to a (very slow) Updatable stage. i.e. IQ_OC
      b. if they don't exists push the records to a bulk loader i.e. IQ12Load
OR

Code: Select all

(B)
Straight forward use a slow Update/Insert stage to do the nightly changes.
{Although, I think this is just my fingers and brain trying to be lazy}
Can I have a vote on this one?
Can anyone foresee any problems or any better solutions?