Optimization and Performance (Wasteful Processing War)

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Nisusmage
Premium Member
Premium Member
Posts: 103
Joined: Mon May 07, 2007 1:57 am

Optimization and Performance (Wasteful Processing War)

Post by Nisusmage »

Needless and Wasteful processing should avoided at all costs. I'm sure we all agree with that.

Now, with that in mind, I have quite a large data amount of Data to move into a Data Warehouse environment. After the first Bulk load of all the current data, I don't want to be wasteful in dumping data, cleaning out, and reloading.

I want to discuss the best strategy for continual data loading.

The strategy I hoping to implement is the following:
1) Bulk Load Once Off. Obviously for the initial staring point of data.
2) Incremental Nighly Loads of the days new records or updated records that have changed, using a "DateChange" field strategy, since the last load.
The floor is Open for discussion.
~The simpliest solutions are always the best~
~Trick is to understand the complexity to implement simplicity~
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Whats there to discuss, you have the strategy right there as your second point. Use the process date to extract only the changed records. Pass it through to your datawarehouse. Now if these daily changes are going to be huge ( in millions) then you can bulk load the changes to a work table. Pass a sql join to identify the inserts and updates. If you are on oracle you can pass a MERGE statement. You will also find a few more discussions here and there.
No matter what path you go, utilize the bulk loader whereever you can. A very senior guy, Ken, says; "If you aint bulk loading, you aint datawarehousing" :wink:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Nisusmage
Premium Member
Premium Member
Posts: 103
Joined: Mon May 07, 2007 1:57 am

Post by Nisusmage »

I was hope for some critics to say no no no this is a much better way. But I can't really think of a better way.

The only task I have is getting guys that handle the source data to implement my suggested changes. The prob is that they work with Highly Normalized OLTP environment, in contrast, I'm going to be working with a fairly De-Normailized OLAP environment.

Thanks for the confidence boost for my path..
~The simpliest solutions are always the best~
~Trick is to understand the complexity to implement simplicity~
Nisusmage
Premium Member
Premium Member
Posts: 103
Joined: Mon May 07, 2007 1:57 am

Post by Nisusmage »

The problem I was having with Datastage was the stages that support "Update or Insert" properties have Very Bad performance. So in the light of this topic's subject, what is the best way to implement the incremental nightly changes?

Note: I'm loading data from Progress(ODBC) and pushing to IQ(IQ12Load).

Possible solutions:

Code: Select all

(A)
1) Pull all the records from the source with the "DateChanged" parameter.
2) Drive the data throught a Transformer with a lookup on the target data's PK field(s) and 
      a. if the record exists push the records to a (very slow) Updatable stage. i.e. IQ_OC
      b. if they don't exists push the records to a bulk loader i.e. IQ12Load
OR

Code: Select all

(B)
Straight forward use a slow Update/Insert stage to do the nightly changes.
{Although, I think this is just my fingers and brain trying to be lazy}
Can I have a vote on this one?
Can anyone foresee any problems or any better solutions?
~The simpliest solutions are always the best~
~Trick is to understand the complexity to implement simplicity~
Post Reply